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(54) A process for the production of biologically active peptide via the expression of modified 
storage seed protein genes in transgenic plants 



(57) The invention pertains to a process for produc- 
ing a determined polypeptide of interest or repeats 
thereof in a seed forming plant. It comprises : 

cultivating plants obtained from regenerated plant 
cells or from seeds of plants obtained from said 
regenerated plant cells over one or several genera- 
tions, whose genetic patrimony, replicable with said 
plants, comprises a precursor-coding nucleic acid 
sequence encoding the precursor of a plant storage 
protein and placed under the control of a seed pro- 
moter, said precursor-coding nucleic acid being 
modified in a non-essential region of its relevant 
sequence which encodes the mature storage pro- 
tein or a sub-unit thereof with a nucleic acid insert in 
appropriate reading phase relationship with the sur- 
rounding part of said relevant sequence, said insert 
including a determined segment encoding an heter- 
ologous determined polypeptide of interest or 
repeats thereof linked to each other and down- 
stream and upstream thereof to the remainder 
parts of said relevant sequence through codons 
encoding aminoacid(s) which define selectively 
cleavable border sites surrounding the peptide of 
interest in the hybrid storage protein encoded by 
the so-modified relevant sequence of said precur- 
sor-coding nucleic acid; 



recovering the seeds of the cultivated plants and 
extracting the hybrid storage proteins contained 
therein, 

cleaving out the peptide of interest from said hybrid 
storage protein, purifying and recovering the pep- 
tide of interest. 

The polypeptide of interest may be a biologically 
active protein and/or a labeled protein. 
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Description 

The invention relates to a process for the production of useful biologically active polypeptides through the modifi- 
cation of appropriate plant genes. 

5 The production of determined biologically active polypeptides in easily purifiable form and useful quantities is still 

fraught, in most instances, with considerable difficulties. 

Alternative procedures are chemical synthesis or production by genetically engineered microorganisms. The first 
is very expensive and often does not result in polypeptides with the correct conformation. The latter alternative is diffi- 
cult due to problems of instability of the polypeptide, intracellular precipitation, and purification of the product in a pure 

10 form. In addition, some classes of peptides, including hormonal peptides, are fully active only after further processing 
such as correct disulfide bridge formation, acetylation, glycosylation or methylation. In nature disulfide bridges are 
formed with high efficiency because they are co-translationally catalysed by protein disulfide isomerase during mem- 
brane translocation of the precursors. The active form is then derived from the precursor by proteolytic cleavage proc- 
esses. 

is Peptides chemically synthesised or overproduced in prokaryotic systems are generally obtained in a reduced form, 
and the disulfide bridges must then be formed by mild oxidation of the cysteine residues. Since one often starts from 
the fully denatured "scrambled" state of the peptide, disulfide bridge formation is then a random process, during which 
intermolecular bridges (yielding higher molecular weight aggregates) and incorrect disulfide bonds (yielding inactive 
peptides) may be generated in addition to the correctly folded peptide. 

20 Using plant cells as systems for the production of determined peptides has also been suggested, e.g. in 
PCT/US86/01 599. There is no evidence in that patent that the suggested methods, whose principle lies in bringing con- 
stitutively to expression said peptide according to known techniques (EP831 12985.3), permit obtaining high expression 
levels without disturbing the plant physiology and high yields in recovering said peptides by separating them from plant 
proteins. This will especially be the case when the whole plant is used as such and grown in soil. 

25 An object of the invention is to overcome these difficulties, to provide economically valuable processes and genet- 
ically engineered live matter which can be produced in large amounts, in which determined polypeptides can both be 
synthesized in large amounts without disturbing the physiology of said live matter and produced in a form providing for 
a high degree of physiological activity common to the wild type peptide having the same or substantially the same 
amino acid sequences and can be easily recovered from said live matter. 

30 More particularly the invention aims at providing genetically modified plant DNA and plant live material including 
said genetically modified DNA replicable with the cells of said plant material, which genetically modified plant DNA con- 
tains sequences encoding for said determined polypeptides whose expression is under the control of a given plant pro- 
motor which conducts said expression in at least a stage of the development of the corresponding plants. This stage of 
development is chosen in a way that the expression occurs in plant organs or tissue which are produced in high 

35 amounts and easily recoverable. 

A further object of the invention is to take advantage of the capacity of seed storage proteins to be produced in 
large amounts in plants and to be expressed at a determined stage of development of said plants, particularly at the 
seed formation stage. More particularly the invention aims at taking advantage of the ease with which water soluble 
storage proteins can be recovered from the corresponding plant seeds. 

40 The expression of foreign genes in plants is well established (De Blaere et al., 1 987). In several cases seed storage 
protein genes have been transferred to other plants. In several cases it was shown that within its new environment the 
transferred seed storage protein gene is expressed in a tissue specific and developmental ly regulated manner (Beachy 
et al., 1985 ; Okamuro et al., 1986 ; Sengupta-Gopalan et al., 1985 ; Higgins et al., 1986). This means that the trans- 
ferred gene is expressed only in the appropriate parts of the seed, and only at the normal time. It has also been shown 

45 in at least one case that foreign seed storage proteins are located in the protein bodies of the host plant (Greenwood 
and Chrispeels, 1985). It has further been shown that stable and functional messenger RNAs can be obtained if a 
cDNA, rather than a complete gene including introns, is used as the basis for the chimeric gene (Chee et al., 1986). 

Seed storage proteins represent up to 90 % of total seed protein in seeds of many plants. They are used as a 
source of nutrition for young seedlings in the period immediately after germination. The genes encoding them are 

so strictly regulated and are expressed in a highly tissue specific and stage specific fashion ((Walling et al., 1986; Higgins, 
1984). Thus they are expressed almost exclusively in developing seed, and different classes of seed storage proteins 
may be expressed at different stages in the development of the seed. They are generally restricted in their intracellular 
location, being stored in membrane bound organelles called protein bodies or protein storage vacuoles. These 
organelles provide a protease-free environment, and often also contain protease inhibitors. These proteins are 

55 degraded upon flowering, and are thought to serve as a nutritive source for developing seeds. Simple purification tech- 
niques for several classes of these proteins have been described. 

Seed storage proteins are generally classified on the basis of solubility and size (more specifically sedimentation 
rate, for instance as defined by Svedberg (in Stryer, L, Biochemestry, 2nd ed., W.H. Freeman, New York, page 599). A 
particular class of seed storage proteins has been studied, the 2S seed storage proteins, which are water soluble albu- 
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mins and thus easily separated from other proteins. Their small size also simplifies their purification. Several 2S storage 
proteins have been characterised at either the protein or cDNA levels (Crouch et al., 1983 ; Sharief and Li, 1982 ; Ampe 
et al., 1986 ; Altenbach et al., 1987 ; Ericson et al., 1986 ; Scofield andCrouch, 1987 ; Josefsson et al., 1987 ; and work 
described in the present application). 2S albumins are formed in the cell from two sub-units of 6-9 and 3-4 kilodaltons 

5 (kd) respectively, which are linked by disulfide bridges. 

The work in the references above showed that 2S albumins are synthesized as complex prepropeptide whose 
organization is shared between the 2S albumins of many different species and are shown diagramatically for three of 
these species in figure 2. Several complete sequences are shown in figure 2. 

As to fig. 2 relative to protein sequences of 2S albumins, the following observations are made. For B. napus. B. 

10 excelsia . and A. thaliana both the protein and DNA sequences have been determined. For R, communis only the protein 
sequence is available (B. napus from Crouch etal., 1983 and Ericson etal., 1 986 : B. excelsia from Ampe etal., 1986, 
de Castro et al., 1987 and Altenbach et al., 1987, R. communis from Sharief et al., 1982). Boxes indicate homologies, 
and raised dots the position of the cysteines. 

Comparison of the protein sequences at the beginning of the precursor with standard consensus sequences for 

15 signal peptides reveals that the precursor has not one but two segments at the amino terminus which are not present 
in the mature protein, the first of which is a signal sequence (Perlman and Halvorson, 1983) and the second of which 
has been designated the amino terminal processed fragment (the so called ATPF). Signal sequences serve to ensure 
the cotranslational transport of the nascent polypeptide across the membrane of the endoplasmic reticulum (Blobel, 
1980), and are found in many types of proteins, including all seed storage proteins examined to date (Herman et al., 

20 1 986). This is crucial for the appropriate compartmentalization of the protein. The protein is further folded in such a way 
that correct disulfide bridges are formed. This process is probably localized at the luminal site of the endoplasmatic 
reticulum membrane, where the enzyme disulfide isomerase is localized (Roden et am., 1982; Bergman and Kuehl, 
1979). After translocation across the endoplasmic reticulum membrane it is thought that most storage proteins are 
transported via said endoplasmic reticulum to the Golgi bodies, and from the latter in small membrane bound vesicles 

25 ("dense vesicles") to the protein bodies (Chrispeels, 1983; Craig and Goodchild, 1984 ; Lord, 1985). That the signal 
peptide is removed cotranslationally implies that the signals directing the further transport of seed storage proteins to 
the protein bodies must reside in the remainder of the protein sequence present. 

2S albumins contain sequences at the amino end of the precursor other than the signal sequence which are not 
present in the mature polypeptide. This is not general to all storage proteins. This amino terminal processed fragment 

30 is labeled Pro in Fig.1 and ATPF in figure 1 A. 

In addition, as shown in figure 1 and 1 A, several amino acids located between the small and large sub-units in the 
precursor are removed (labeled link in Fig. 1 and IPF in figure 1 A, which stands for internal processed fragment). Fur- 
thermore, several residues are removed from the carboxyl end of the precursor (labeled Tail in Fig.& and CTPF in figure 
1 A, which stands for carboxyl terminal processed fragment). The cellular location of these latter process steps is uncer- 

35 tain, but is most likely the protein bodies (Chrispeels 1 983 ; Lord, 1 985). As a result of these processing steps the small 
sub-unit (Sml. Sub) and large sub-unit remain. These are linked by disulfide bridges, as discussed below. 

When the protein sequences of 2S-albumins of different plants are compared strong structural similarities are 
observed. This is more particularly illustrated by figure 2 and 2A, which provide the aminoacid sequences of the small 
sub-unit and large sub-unit respectively of representative 2S storage seed albumin proteins of different plants, i.e. : 

40 R. comm. : Ricinus communis 

A. thali. : Arabidopsis thaliana 

B. napus : Brassica napus 

B. excel. : Bertholletia excelsa (Brazil nut) 
It must be noted that in fig. 2 and 2A 

45 

the aminoacid sequences of said sub-units extend on several lines ; 

the cysteine groups of the aminoacid sequences of the examplified storage proteins and identical aminoacids in 
several of said proteins have been brought into vertical alignment ; the hyphen signs which appear in some of these 
sequences represent absent aminoacids, in other words direct linkages between the closest aminoacids which sur- 
so rounded them ; 

the aminoacid sequences which in the different proteins are substantially conserved are framed. 

It will be observed that all the sequences contain eight cysteine residues (the first and second ones in the small 
sub-unit, the remainder in the large sub-unit) which can participate in disulfide bridges as diagrammatically shown in 
55 fig. 3, which represents a hypothetical model (for the purpose of the present discussion) rather than a representation of 
the true structure proven by experimentation of the 2S-albumin of Arabidopsis thaliana . Said hypothetical model has 
been inspired by the disulfide bridge mediated loop-formation of animal albumins, such as serum albumins (Brown, 
1976), alpha-fetoprotein (Jagodzinski et al., 1987; Morinaga et al.; 1983) and the vitamine D binding protein where anal- 
ogous constant C-C doublets and C-X-C triplets were observed (Yang et al., 1985). 
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Furthermore, the distances between the cysteine residues are substantially conserved within each sub-unit, with 
the exception of the distance between the sixth and seventh cysteine residues in the large sub-unit. This suggests that 
these arrangements are structurally important, but that some variation is permissible in the large sub-unit between said 
sixth and seventh cysteines. 

5 The invention is based on the determination of the regions of the storage protein which can be modified without an 

attendant alteration of the properties and correct processing of said modified storage protein in plant seeds of trans- 
genic plants. This region (diagrammatically shown in fig. 3 by an enlarged hatched portion) will in the examples hereaf- 
ter referred to be termed as the "hypervariable region". Fig. 3 also shows the respective positions of the other parts of 
the precursor sequence, including the "IPF" section separating the small sub-unit and large sub-unit of the precursor, 

10 as well as the number of aminoacids (aa) in substantially conserved portions of the protein sub-units cystein residues. 
The processing cleavage sites are shown by symbols T. 

The seeds of many plants contain albumins of approximately the same size as the storage proteins discussed 
above. However, for ease of language the term "2S albumins" will be used herein to refer to seed proteins whose genes 
encode a peptide precursor with the general organization shown in figure 1 and which are processed to a final form con- 

15 sisting of two subunits linked by disulfide bridges. This is not to be construed as indicating that the process described 
below is exclusively applicable to such 2S albumins. 

The process of the invention for producing a determined polypeptide of interest comprises : 

cultivating plants obtained from regenerated plant cells or from seeds of plants obtained from said regenerated 
20 plant cells over one or several generations, wherein the genetic patrimony or information of said plant cells, repli- 
cable within said plants, includes a nucleic acid sequence, placed under the control of a seed-specific promoter, 
which can be transcribed into the mRNA encoding at least part of the precursor of a storage protein including the 
signal peptide of said plant, said nucleic acid being hereafter referred to as the "precursor encoding nucleic acid" 

25 wherein said nucleic acid contains a nucleotide sequence (hereafter termed the "relevant sequence"), which 

relevant sequence comprises a non essential region modified by a heterologous nucleic acid insert forming an 
open reading frame in reading phase with the non modified parts surrounding said insert in said relevant 
sequence. 

wherein said insert includes a nucleotide segment encoding said polypeptide of interest. 
30 wherein said heterologous nucleotide segment is linked to the adjacent extremities of the surrounding non 

modified parts of said relevant sequence by one or several codons whose nucleotides belong either to said 
insert or to the adjacent extremities or to both, 

wherein said one or several codons encode one or several aminoacid residues which define selectively cleav- 
able border sites surrounding the peptide of interest in the hybrid storage protein or storage protein sub-unit 
35 encoded by the modified relevant sequence ; 

recovering the seeds of the cultivated plants and extracting the hybrid storage proteins contained therein, 
cleaving out the peptide of interest from said hybrid storage protein at the level of said cleavage sites; and 
recovering the peptide of interest in a purified form. 

40 

It will be appreciated that under the above-mentioned conditions each and every cell of the cultivated plant will 
include the modified nucleic acid. Yet the above defined recombinant or hybrid sequence will be expressed at high lev- 
els only or mostly in the seed forming stage of the cultivated plants and, accordingly, the hybrid protein produced mostly 
in the seeds. 

45 It will be understood that the "heterologous nucleic acid insert" defined above consists of an insert which contains 
nucleotide sequences which at least in part, are foreign to the natural nucleic acid encoding the precursor of the storage 
protein of the seeds or plant cells concerned. Most generally the segment encoding the polypeptide of interest will itself 
be foreign to the natural nucleic acid encoding the precursor of said storage protein. Nonetheless, the term "heterolo- 
gous nucleic acid insert" does also extend to an insert containing a segment as above-defined normally present in the 

so genetic patrimony or information of said seeds or plant cells, the "heterologous" character of said insert then adressing 
to the one or several codons which surround it, on both sides thereof and which link said segment to the non-modified 
parts of the nucleic acid encoding said precursor. Under such last mentioned circumstances the invention thus provides 
for a method which enables the production and easy separation and recovery of a valuable protein normally produced 
in the plant itself, either at the seed forming stage or at any other stage of the development of the plant, and either in 

55 the protein bodies of the seeds or any other location of said plant cells. 

The "polypeptide of interest" will usually consist of a single polypeptide, or protein which, when cleaved out from 
the hybrid storage proteins in the final stages of the process of this invention, will retain or resume at least those of the 
biological properties sought to be possessed by that single polypeptide or protein of interest. By way of non limitative 
examples of properties sought to be retained by the polypeptide of interest, one may cite, e.g. enzymatic or therapeutic 
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activities, the capability of being recognized by determined antibodies, immunogenic properties, for instance the capa- 
bility of eliciting in a living host antibodies which are able to neutralize such peptide of interest or a pathogenic agent 
containing antigens including the same or an analogous sequence of aminoacids as said "polypeptide of interest". 
However the "polypeptide of interest" may also comprise repeats of a unit, particularly of an individual peptide or 

5 polypeptide having any desired biological activity, said units being joined with one another over or through cleavable 
sites permitting the separations of the biologically repeats or units from one another. Though not decisive, such cleav- 
able sites are advantageously identical to or sensitive to the same cleaving means, e.g. a determined restriction 
enzyme as the above-defined "border cleavage sites" which enable the overall "polypeptide of interest" to be cleaved 
out from the hybrid storage protein. As a matter of fact, separation of the active units from one another may then be 

10 achieved simultaneously with the above mentioned "cleaving out" operations. Yet the different units or repeats may be 
joined through different cleavage sites, whereby the separation of said units from one another may be undertaken sub- 
sequent to the "cleaving out" operations of said "polypeptide of interest" from the hybrid storage protein. 

The number of repetitive units in the polypeptide of interest will of course be dependent upon the maximum length 
of polypeptide of interest which may be incorporated in the storage protein concerned under the conditions defined 

15 herein. 

In the preceding definition of the process according to the invention the so-called "non-essential region" of the rel- 
evant sequence of said nucleic acid encoding the precursor, consists of a region whose nucleotide sequence can be 
modified either by insertion into it of the above defined insert or by replacement of at least part of said non-essential 
region by said insert, yet without modifying the resulting overall configuration of said hybrid storage protein as com- 

20 pared to that of the non-modified natural storage protein as well as the transport of the correspondingly modified nas- 
cent hybrid storage protein into the abovesaid protein bodies. 

In the present invention the precursor-coding nucleic acid referred to above may of course originate from the same 
plant species as that which is cultivated for the purpose of the invention. It may however originate from another plant 
species, in line with the teachings of Beachey et al., 1985 and Okamuro et al., 1987 already of record. 

25 In a similar manner the seed-specific promoter may originate from the same plant species or from another, subject 
in the last instance to the capability of the host plant's polymerases to recognize it. 

Any method for the location of a non-essential region in a storage protein can be used. Once this region is defined 
at the protein sequence level, the corresponding region of the precursor encoding nucleic acid can be altered. For 
instance, non-essential regions can be located using methods based on the establishment of secondary and tertiary 

30 protein structures by molecular modeling. Such models will allow the identification of regions of the protein critical for 
its configuration or interaction in higher order aggregations. In the absence of such technology, the peptide sequences 
of analogous proteins from various plant species can be compared. Those subsequences which said peptide 
sequences have in common (and which prima-facie will support the presumption that they cannot be modified without 
affecting the structure, processing, intracellular passage, or packaging of the peptide in a deleterious way) can be dis- 

35 tinguished from those which are so different from one another as to support the assumption that they may consist of 
"non-essential regions" which may then be deemed to be eligible for modification by a determined heterologous insert. 

Such an approach is possible when the protein or nucleic acid sequences of several similar storage proteins origi- 
nating from different plants have been determined (as is the case for the 2S albumins). A suitable method then com- 
prises identifying said nucleic acid regions which encode peptide regions undergoing variability in either amino acid 

40 sequence or length or both, as compared with the regions which, on the contrary, do exhibit substantial conservation of 
amino acid sequence between said several plant species. Where the storage proteins under study contain cysteine res- 
idues and where further it is thought or known through experimental data that said cysteines participate in disulfide 
bridges likely to play an important part in the establishment of the structure and conformation of the storage proteins 
concerned, the method should be extended to take this into account. In this case, the cysteine residues should not be 

45 among those residues altered by the modification of the storage protein, and where sequence comparison of protein 
sequences of analogous proteins shows that the distance (in amino acid residues) between cysteines is conserved, this 
distance should not be altered by any subsequent modification. The said non-essential regions in the protein sequence 
so selected can then be modified by insertion into the corresponding region of the precursor-coding nucleic acid, the 
nucleic acid segments encoding the desired peptide product and, after said modification has been achieved, the 

50 expression of the modified storage protein in the seeds recoverable at the seed-forming stage of plant development can 
be assayed. 

Another method which is available within the skills of a person skilled in the art to determine if a region thought to 
be amenable to modification consists in to make such a modification and to express the chimeric gene in any one of 
several expression systems which, while not ppropriate to produce economically interesting amounts of the chimeric 
55 protein, will, if the chimeric protein is stable, produce small quantities for analysis. In such experiments, the unmodified 
protein should also be brought to expression as a control. Such systems include, but re not limited to, the Xenopus 
leaves oocytes (Bassener et al., 1983), transient expression in plant chloroplats (Fromm et al., 1985), yeast (Hollenberg 
et al, 1985), plant callus and the Acetabularia system. The latter hs been used byBrown et al (1986) for the functional 
analysis of zein genes and their modification by sequences encoding lysine. 
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The choice of precursor-coding nucleic acids encoding the precursors of 2S-proteins, particular water-soluble 2S- 
proteins for the production of the modified nucleic acids to be transferred into the plant cells to be modified is particularly 
attractive for the reasons already of record. 

As can be seen on figs 2 and 2A, the regions which are intercalated between the first and second cysteines in the 
5 small sub-unit of the protein, between the fifth and sixth cysteines, on the one hand, and between the seventh and 
eighth cysteines in the large sub-unit of the protein show a substantial degree of conservation or similarity. It would thus 
seem that these regions are in some way essential for the proper folding and/or stability of the the protein when synthe- 
sized in the plant seeds. 

To the contrary other regions such as at the end of the small subunit, at the beginning or end of the large sub-unit, 

10 show differences of such a magnitude that they can be held as presumably having no substantial impact on the final 
properties of the protein. A region which does not seem essential, consists of the middle position of the region located 
in the large sub-unit, between the sixth and the seventh cysteine of the mature protein. As visible on the drawing (Fig. 2) 
B napus comprises a CKQQM sequence between the Q aminoacid which precedes it and the V aminoacid which fol- 
lows it, whereas at the same level A. thali has no similar sequence at all between the same seighbouring aminoacids 

is and B. excel and R.comm comprise shorter CEQ and CQ peptides respectively. 

Thus it appears that in addition to the absence of similarity at the level of the aminoacid residues, there appears a dif- 
ference in length which makes that region eligible for substitutions in the longest 2S albumins and for addition of ami- 
noacids in the shortest 2S albumins or for elongation of both. 

The same observations should extend at the level of approximately of the end of the first third part of the same 

20 region between said sixth and seventh cysteine: see sequence of R, communis which is much shorter in that region 
than the corresponding regions of the other examplified 2S-proteins. 

Experimentation, which is within the skills of the person skilled in the art, will show how much of the other aminoac- 
ids which neighbour the abovesaid sixth and seventh cysteine of the mature protein could further be substituted without 
causing disturbance of the stability and correct processing of the hybrid protein. For instance experimentation will show 

25 how much of the other aminoacids which neighbour the abovesaid GKQQM sequence of & napus upstream and down- 
stream thereof, could further be substituted without causing the hybrid protein likely to be formed to be further substi- 
tuted without loss by the hybrid protein of the essential properties of the normal a napus 2S albummin. The 
modifications contemplated should preferably not affect the three, preferably six aminoacids adjacent to the relevant 
cysteins, e.g. the sixth and seven cysteins of the 25-mature protein. 

30 It is of course realized that caution must be exercized against hypotheses based on arbitrary choices as concerns 
the bringing into line of similar parts of proteins which elsewhere exhibit substantial differences. Nevertheless such 
comparisons have proven in other domains of genetics to provide the man skilled in the art with appropriate guidance 
to reasonably infer from local structural differences, on the one hand, and from local similarities, on the other hand, in 
similar proteins of different sources, which parts of such proteins can be modified and which parts cannot, when it is 

35 sought to preserve some basic properties of the non modified protein in the same protein yet locally modified by a for- 
eign or heterologous sequence. 

Thus it is prima facie deemed that, subject to verification, any part of a protein or of a subunit thereof may be 
deemed as eligible for substitution by a peptide having a different aminoacid sequence. 

The choice of the adequate non-essential regions to be used in the process of the invention will also depend on the 

40 length of the peptide of interest. Basically the method of the invention thus allows the production of biologically active 
polypeptides in the range of 3-100 aminoacids in length. This biologically active polypeptide may have a vegetal origin 
or may be a non plant variety specific polypeptide having a bacterial origin or a fungal origin or an algal origin or an 
invertebral origin or a vertebral origin such as a mamalian origin. 

The sequence (insert) to be inserted in the appropriate regions of the relevant sequence storage protein, e.g. a 2S 

45 protein, or a sub-unit thereof, does not, normally, include only the segment coding this polypeptide of interest, but also 
the codons (or parts thereof when the contiguous nucleotides of the non-modified parts of the relevant nucleotide 
sequences of the precursor-coding nucleic acid happen to adequately supplement the codons) encoding aminoacids 
or peptides which form the abovesaid aminoacid junctions cleavable, e.g., by protease or chemical treatment, so that 
the peptide of interest can later be recovered from the purified 2S protein. The junction-sequences can be made either 

so as a double stranded oligomer or, if part of a gene is available, as a restriction fragment, but in the latter case the cleav- 
age sites, e.g. protease cleavage sites must generally be added. 

The choice of sequences bordering the peptide of interest depends on several factors which essentially depend on 
the techniques to be used for purifying that peptide in the final stages of the process. The peptide of interest can be 
flanked by any proteolytic cleavage sites, provided that the sequence of the peptide of interest does not contain internal 

55 similar cleavage sites. Finally, the proteases and/or chemical cleavage reagents should be specific and readily availa- 
ble. They should correctly cleave the inserted sequence at both the amino and carboxyl termini. For example, the pro- 
tease trypsin cleaves after Arginine or Lysine residues assuming they are not followed by a Proline. Thus if neither 
Arginine of Lysine residues are present in the peptide of interest (or are followed by a Proline) the sequence can be 
flanked by codons encoding one of those two amino acids. The peptide can then be cleaved out of the hybrid protein 
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using trypsin, followed by treatment with the exoprotease carboxypeptidase B to remove the extra carboxyl terminus 
Arg or Lys. Similarly, the protease endo-Lys-C (Jekel et al., 1983) cleaves after Lysine residues,so that a peptide could 
be inserted between two such residues, cleaved from the 2S albumin using this protease, and the extra Lysine again 
removed using carboxypeptidase B. Such a strategy is particularly useful when the 2S albumin is used, as the latter is 

5 poor in Lysine, so that only a few fragments are generated, resulting in easy purification. Cyanogen bromide serves as 
an example of a chemical cleavage reagent. Treatment with this reagent cleaves on the carboxyl side of Methionine. 
Thus, for each case a separate strategy must be developed, but the wide variety of protease cleavage techniques avail- 
able allows the same basic principles to be followed. As often as possible, strategies should use economical commer- 
cially available proteases or reagents, and purification steps limited in number. For reviews of various enzymatic and 

w chemical cleavage techniques see volumes 19 (1970 and 47 (1977) of Methods in Enzymology. 

Finally, some peptides are found in nature with C-terminal alpha-amide structures (alpha-melanotropin, calcitonin, 
and others ; see Hunt and Dayhoff, 1976). This post-translational modification has been shown to be of essential impor- 
tance for the biological activity of the peptide. Such a C-terminally amidated peptide can be obtained by transformation 
of a C-terminal glycine residue into an amide group (Seiringer et al., 1985). Therefore such peptides can be generated 

is from the 2S hybrid protein by adding a C-terminal glycine residue to the peptide which, after purification, is transformed 
into an amide group. 

When the complete protein sequence of the region to be inserted into the storage protein has been determined, 
including both the polypeptide of interest and the aminoacids of peptides which form the above described cleavable 
junctions, the nucleotide sequence to encode said protein sequence must be determined. It will be recognized that 

20 while perhaps not absolutely necessary the codon usage of the encoding nucleic acid should where possible be similar 
to that of the gene being modified. The person skilled in the art will have access to appropriate computer analysis tools 
to determine said codon usage. 

Any appropriate genetic engineering technique may be used for substituting the insert for part of the selected pre- 
cursor-coding nucleic acid, or for inserting it in the appropriate region of said precursor-coding nucleic acid. The general 

25 in vitro recombination techniques followed by cloning in bacteria can be used for making the chimeric genes. Site- 
directed mutagenesis can be used for the same purposes as further examplified hereafter. DNA recombinants, e.g. 
plasmids suitable for the transformation of plant cells can also be produced according to techniques disclosed in current 
technical literature. The same applies finally to the production of transformed plant cells in which the hybrid storage pro- 
tein encoded by the relevant parts of the selected precursor-coding nucleic acid can be expressed. By way of example, 

30 reference can be made to the published European applications nr. 1 1 6 71 8 or to international application WO 84/0291 3 
(incorporated herein by reference) and , which disclose appropriate techniques to that effect. 

The preceding discussion has been based more specifically, by way of example, on the modification of storage 2S 
albumin. It will be understood that the process of this invention can also be carried out upon using any other type of 2S- 
storage protein or any other storage protein having another sedimentation coefficient, (e.g. a 7S-, IIS- and -12S stor- 

35 age protein) or the same, provided that the DNA sequences which encode it in the plant from which it can be isolated, 
have been or can be identified and that non-essential or "hypervariable subsequences" therein have been or can be 
detected. 

Examples (by way of illutration only) of such other storage proteins consist (see also Higgins (1984) for review) : 

40 - of other albumins, which are water soluble storage proteins, which may be either 12S like such as the lectins iso- 
latable from pea and various beans, or either 2S like such as the 2S albumins already or record or other 2S albu- 
mins isolatable from pea, radish and sunflower ; 

of globulins, which are storage proteins soluble in salt solutions, which may be either 7-8S like such as the phase- 
olins isolatable from Phaseolus. the vicilins isolatable from pea, the conglycinins isolatable from soybean, the oat- 
45 vicilins isolatable from oat, or either 1 1-14S like, such as the legumins isolatable from pea, the glycinins isolatable 
from soy-bean, the helianthins isolatable from sunflower or other 1 1-14S globulins isolatable from beans, Arabidop- 
sis. and probably from wheat ; 

of prolamins, which are alcohol soluble storage proteins, such as the zeins isolatable from corn, the hordiens iso- 
latable from barley, the gliadins isolatable from wheat and the kaf irins isolatable from sorghum ; 
50 - of glutelins, which are storage proteins soluble under low pH conditions and isolatable from wheat. 

Some of these storage proteins-merely cited by way of examples- are poor in cysteines. Yet the different proteins 
of a same group do show variable regions on the one hand, better conserved regions on the other hand. 

Needless to say that these storage proteins could be used as suitable vectors for the production of the abovesaid 
55 hybrid proteins and their respective purifications from the seed proteins, upon relying on their respective specific solu- 
bility characteristics in the corresponding solvents. 

The procedures which have been disclosed generally hereabove apply to the adequate modification of the non- 
essential regions of any of said other storage proteins by an heterologous insert containing a DNA sequence encoding 
the peptide of interest and then to the transformation of the relevant plants with the chimeric gene obtained for the pro- 
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duction of a hybrid protein containing the sequence of the peptide of interest in the seeds of the relevant plant, and they 
apply to the recovery of the peptide of interest from said plants. Needless to say that the person skilled in the art will in 
all instances be able of selecting which of the existing techniques would at best fulfill its needs at the level of each step 
of the production of such modified plants, to achieve the best production yields of said peptide of interest. 

5 The preceding discussion has been based more specifically, by way of example, on the modification of the hyper- 

variable region of a determined storage protein by an insert encoding a biologically ctive peptide. It will be understood 
that the person skilled in art may choose as insert a sequence which encode repeats of said biologically active peptide, 
wherein every sequence encoding said biologically active peptide is separated from the other by border sequences 
encoding selective cleavage sites which allow their separation during purification. 

10 For instance the following process can be used in order to exploit the capacity of a storage protein, to be used as 
a suitable vector for the production in seeds of a determined polypeptide of interest or repeats thereof, when the corre- 
sponding precursor-coding nucleic acid has been sequenced. Such process then comprises: 

1) locating and selecting one of said relevant sequences of the precursor-coding nucleic acid which comprises a 
is non-essential region encoding a peptide sequence which can be modified by substituting an insert for part of it or 

by inserting of said insert into it, which modification is compatible with the conservation of the configuration of the 
storage protein; 

2) inserting a nucleic acid insert in the selected region of said precursor nucleic acid in appropriate reading frame 
relationship with the non-modified parts of said relevant sequence, which insert includes a determined segment 

20 encoding the polypeptide of interest or repeats thereof and, downstream and upstream of said determined seg- 
ment, suitable nucleotides, codons or triplets of nucleotides which, after said insertion into the precursor-coding 
nucleic acid has been achieved, participate in the formation of codons encoding aminoacid junctions linking the 
polypeptide of interest or its individual repeats to each other and into the relevant parts of the storage protein or 
sub-unit thereof, whereby said amino-acid junctions define border sites surrounding the peptide of interest and 

25 which can themselves be selectively cleaved, e.g. by specific peptidases; 

3) inserting the modified precursor-coding nucleic acid obtained in a plasmid suitable for the transformation of plant 
cells which can be regenerated into full seed-forming plants, wherein said insertion is brought under the control of 
regulation elements, particularly a seed specific promoter capable of providing for the expression in the seeds of 
said plants of the open-reading frames associated therewith; 

30 4) transforming a culture of such plant cells with such modified plasmid; 

5) assaying the expression of the chimeric storage protein having inserted into its hyper viariable region the deter- 
mined sequence of the segment encoding the polypeptide of interest or the repeats thereof and, when achieved 

6) regenerating said plants from the transformed plant cells obtained and growing said plants up to the seed form- 
ing stage; 

35 7) recovering the seeds and extracting the storage proteins contained therein; 

8) cleaving said storage proteins e.g. with said specific peptidases, isolating and recovering the peptide of interest. 

In the case of storage 2S-proteins which contain a substantial number of cysteine residues, which storage proteins 
are preferred at the present time, and further when the precursor-coding nucleic acids of several similar proteins per- 
40 forming the same functions in different plants, yet originating from said different plants respectively, are available and 
have been (or can be) sequenced, step 1) of the general process defined above may be carried out as follows (it being 
understood that the sequence of steps recited hereafter is optional and can be replaced by any other procedure aiming 
at achieving the same result). Said "step 1" then comprises: 

45 a) selecting several of said plant storage proteins, available and identifiable in several seed forming plant species 
respectively; 

b) locating the precursor-coding nucleic acid sequence which in each of said plant species encodes the precursor 
of said plant storage protein and determining in said precursor-coding nucleic acid a relevant nucleotide sequence 
consisting of a sequence encoding the mature storage protein or an appropriate sub-sequence encoding for a sub- 

so unit of said mature storage protein; 

c) determining the relative positions of the codons which encode the successive cysteine residues in said mature 
protein or protein sub-units and identifying the corresponding successive nucleic acid regions located upstream of, 
between, and downstream of said codons within said sub-sequences of the precursor-coding nucleic acid and 
identifying in said successive regions those parts which undergo variability in either aminoacid sequence or length 

55 or both from one plant species to another as compared with those other regions which do exhibit substantial con- 
servation of aminoacid sequence in said several plant species, one of said nucleotide regions being then selected 
for the insertion therein of the nucleic acid insert including the segment encoding the peptide of interest or repeats 
thereof, e.g. as disclosed under 2) hereabove. 
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Hence last mentioned enbodiment of the invention provides that in having the heterologous polypeptide of interest 
or repeats thereof made as part of a hybrid protein in a plant, it will pass the plant protein disulfide isomerase during 
membrane translocation, thus increasing the chances that the correct disulfide bridges be formed in the hybrid precur- 
sor as in its normal precursor situation, on the one hand, and that the polypeptide of interest or repeats thereof be pro- 
5 tected against the different drawbacks which have been recalled above as concerns the standard genetic engineering 
techniques for producing foreign peptides in host microorganisms, on the other hand. 

The invention further refers to the recombinant nucleic acids themselves for use in the process of the invention; par- 
ticularly to the 

10 - recombinant precursor encoding nucleic acid defined in the frame of said process; 

recombinant nucleic acids containing said modified precursor -coding nucleic acid under the control of a seed-spe- 
cific promoter, whether the latter originates from the same DNA as that of said precursor-coding nucleic acid of 
from a DNA of another plant, 

vectors, more particularly plant plasmids e.g., Ti-derived plasmids modified by any of the preceding recombinant 
15 nucleic acids for use in the transformation of the above plant cells. 

The chimeric gene should be provided with a suitable signal sequence if it does not posses one (which all storage 
proteins do). 

The invention also relates to the regenerate source of a polypeptide of interest, which is formed of either plant cells 
20 of a seed-forming-plant, which plant cells are capable of being regenerated into the full plant or seeds of said seed- 
forming plants wherein said plants or seeds have been obtained as a result of one or several generations of the plants 
resulting from the regeneration of said plant cells, wherein further the DNA supporting the genetic information of said 
plant cells or seeds comprises a nucleic acid or part thereof, including the sequences encoding the signal peptide, 
which can be transcribed in the mRNA corresponding to the precursor of a storage protein of said plant, placed under 
25 the control of a seed specific promoter, and 

wherein said nucleic acid sequence contains a relevant modified sequence encoding the mature storage protein or 
one of the several sub-sequences encoding for the corresponding one or several sub-units of said mature storage 
protein, 

30 wherein further the modification of said relevant sequence takes place in one of its non essential regions and con- 
sists of a heterologous nucleic acid insert forming an open-reading frame in reading phase with non modified parts 
which surround said insert in the relevant sequence, 

wherein said insert includes a nucleotide segment encoding said polypeptide of interest, 

wherein said heterologous nucleotide segment is linked to the adjacent extremities of the surrounding non modified 
35 parts of said relevant sequence by one or several codons whose nucleotides belong either to said insert or or to the 
adjacent extremities or to both, 

wherein said one or several codons encode one or several aminoacid residues which define selectively cleavable 
border sites surrounding the peptide of interest in the hybrid storage protein or storage protein sub-unit encoded by 
the modified relevant sequence ; 

40 

It is to be considered that although the invention should not be deemed as being limited thereto, the nucleic 
inserts encoding the polypeptide of interests or repeats thereof will in most instances be man-made synthetic oligonu- 
cleotides or oligonucleotides derived from viral or bacterial genes or of from cDNAs derived of viral or bacterial RNAs, 
or further from non-plant eucaryotic genes, all of which shall normally escape any possibility of being inserted at the 

45 appropriate places of the plant cells or seeds of this invention through biological processes, whatever the nature 
thereof. In other words, these inserts are usually "non plant variety specific", specially in that they can be inserted in 
different kinds of plants which are genetically totally unrelated and thus incapable of exchanging any genetic material 
by standard biological processes, including natural hybridization processes. 

Thus the invention further relates to the seed forming plants themselves which have been obtained from said trans- 

50 formed plant cells or seeds, which plants are characterized in that they carry said hybrid precursor-coding nucleic acids 
associated with a seed promoter in their cells, said inserts however being expressed and the corresponding hybrid pro- 
tein produced mostly in the seeds of said plants. 

There follows an outline of a preferred method which can be used for the modification of 2S seed storage protein 
genes, their expression in transgenic plants, the purification of the 2S storage protein, and the recovery of the biologi- 

55 cally active peptide of interest. The outline of the method given here is followed by a specific example. It will be under- 
stood from the person skilled in the art that the method can be suitably adapted for the modification of other 2S seed 
storage protein genes. 
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1 . Replacement or supplementation of the hypervariable region of the 2S storage protein gene by the sequence of inter- 
est. 

Either the cDNA or the genomic clone of the 2S albumin can be used. Comparison of the sequences of the hyper- 
5 variable regions of the genes in figure 2 shows that they vary in length. Therefore if the sequence of interest is short 
and a 2S albumin with a relatively short hypervariable region is used, the sequence of interest can be inserted. Other- 
wise part of the hypervariable region is removed, to be replaced by the insert containing the segment or sequence of 
interest and, if appropriate, the border codons. The resulting hybrid storage protein may be longer or shorter than the 
non-modified natural storage protein which has been modified. In either case two standard techniques can be applied 
10 ; convenient restriction sites can be exploited, or mutagenesis vectors (e.g. Stanssens et al. 1987) can be used. In both 
cases, care must be taken to maintain the reading frame of the message. 

2. The altered 2S albumin coding region is placed under the control of a seed specific gene promoter. 

is A seed specific promoter is used in order to ensure subsequent expression in the seeds only. This facilitates recov- 
ery of the desired product and avoids possible stresses on other parts of the plant. In principle the promoter of the mod- 
ified 2S albumin can be used. But this is not necessary. Any other promoter serving the same purpose can be used. 
The promoter may be chosen according to its level of efficiency in the plant species to be transformed. In the examples 
below a lectin promotor from soybean and a 2S albumin promoter from Arabidoosis are used. If a chimeric gene is so 

20 constructed, a signal peptide encoding region must also be included, either from the modified gene or from the gene 
whose promotor is being used. The actual construction of the chimeric gene is done using standard molecular biologi- 
cal techniques (see example). 

3. The chimeric gene construction is transferred into the appropriate host plant. 

25 

When the chimeric or modified gene construction is complete it is transferred in its entirety to a plant transformation 
vector. A wide variety of these, based on disarmed (non-oncogenic) Ti-plasmids derived from Agrobacterium tumefa- 
ciens, are available, both of the binary and cointegration forms (De Blaere et al., 1987). A vector including a selectable 
marker for transformation, usually antibiotic resistance, should be chosen. Similarly, the methods of plant transforma- 
30 tion are also numerous, and are fitted to the individual plant. Most are based on either protoplast transformation (Marton 
et al., 1 979) or transformation of a small piece of tissue from the adult plant (Horsch et al. , 1 985). In the example below, 
the vector is a binary disarmed Ti-plasmid vector, the marker is kanamycin resistance, and the leaf disc method of 
transformation is used. 

Calli from the transformation procedure are selected on the basis of the selectable marker and regenerated to adult 
35 plants by appropriate hormone induction. This again varies with the plant species being used. Regenerated plants are 
then used to set up a stable line from which seeds can be harvested. 

4. Recovery of biologically active polypeptides. 

40 The purification of 2S plant albumins is well established (Youle and Huang, 1981 ; Ampe et al., 1986). It is a major 
protein in mature seeds and highly soluble in aqueous buffers. A typical purification of 2S-storage proteins involves the 
following steps : 1 , homogenization of seed in dry ice and extraction with hexane ; 2, extraction with high salt buffer and 
dialysis against distilled water, precipitating the contaminating globulins ; 3, further purification of the water soluble frac- 
tion by gel-filtration chromatography, which separates the smaller 2S-storage proteins from the larger contaminants ; 

45 and 4, final purification by ion-exchange chromatography. The exact methods used are not critical to the technique 
described here, and a wide range of classical techniques, including gel filtration, ion exchange and reversed phase 
chromatography, and affinity or immunoaffinity chromatography may be applied both to purify the chimeric 2S albumin 
and, after it is cleaved from the albumin, the biologically active peptide. The exact techniques used for this cleavage will 
be determined by the strategy decided upon at the time of the design of the flanking sequences (see above). As 2S 

so albumins are somewhat resistant to proteases, denaturation steps should often be included before protease treatment 
(see example). 

5. Assays for biologically active peptides. 

55 Assays for the recovered product are clearly dependent on the product itself. For initial screening of plants, immu- 
nological assays can be used to detect the presence of the peptide of interest. Antibodies against the desired product 
will often function even while it is still part of the hybrid 2S protein. If not, it must be partially or completely liberated from 
the hybrid, after which peptide mixtures can be used. The screening with antibodies can be done either by classical 
ELISA techniques (Engvall and Pesce, 1978) or be carried out on nitrocellulose blots of proteins previously separated 
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by polyacrylamide gel electrophoresis (Western blotting, Towbin et al., 1979). The purified peptide can be further ana- 
lysed and its identity confirmed by amino acid composition and sequence analysis. 

Bioassays for biological activity will of course depend upon the nature and function of the final peptide of interest. 

It has to be understood that the present invention is also applicable for the production of labeled proteins which may 
5 be biologically active using the plant seed storage proteins as suitable vectors. In this case, plant regeneration of the 
obtained transformants, as described under point 3 hereabove, has to occur under conditions by which labeled carbon 
sources ( 13 C) and/or nitrogen sources ( 15 N) and/or hydrogen sources ( 2 H) and/or sulphur sources ( 35 S) and/or phos- 
phor sources ( 32 P) has to be provided to the transformed growing plants (Kollman et al., 1979 ; Jung and Jettner, 1972 
; De Wit et al., 1978). 

10 Further characteristics of the invention will appear in the course of the non-limiting disclosure of specific examples, 
particularly on the basis of the drawings in which: 

Figs. 1 ,1 A, 2 A, 2 and 3 refer to overall features of 2S - storage proteins as already discussed above. 
Fig. 4 represents part of the sequence of the Brazil nut 2S-albumin obtained from the pBN2S1 plasmid obtained as 
15 indicated hereafter and related elements. 

Fig. 5 represents restriction sites used in the constructions shown in other drawings. 

Figs. 6 and 7 show diagrammatically the successive phases of the construction of a chimeric plasmid including a 
restriction fragment containing the nucleic acid encoding a precursor, (the herein so-called "precursor-coding 
nucleic acid" the whole suitable for modification by an insertion of DNA sequences encoding a polypeptide of inter- 
ne est, particularly through site-directed mutagenesis. 

Fig. 8 shows the restriction sites and genetic map of a plasmid suitable for the performance of the above site- 
directed mutagenesis. 

Fig. 9 shows diagrammatically the different steps of the site-directed mutagenesis procedure of Stanssens et al 
(1987) as generally applicable to the modification of nucleic acid at appropriate places. 
25 - Figs. 10,11 and 1 2 illustrate diagrammatically the further steps of the modification of the abovesaid chimeric plas- 
mid including said precursor nucleic acid to include therein, in a non essential region of its precursor nucleic acid 
sequence, an insert encoding a polypeptide of interest, Leu-enkephalin by way of example in the following disclo- 
sure. 

Fig. 13 represents the sequence of 1 kb fragment containing the Arabidopsis thaliana 2S albumin gene and shows 
30 related elements. 

Fig. 14 provides the protein sequence of the large sub-unit of the above Arabidopsis 2S protein together with 

related oligonucleotide sequences. 

Fig. 15 represents the restriction map of pGSC1 703. 

Fig. 16 represents the restriction map of pGSC1 703A. 

35 - Fig. 1 7A represents a chromatogram of an aliquot of the synthetic peptide YGGFLK, used as marker, on a C4 col- 
umn. The gradient (dashed line) is isocratic at 0% solvent between 0 and 5 minutes, and solvent B increases to 
100% into 70 minutes. Solvent A: 0,1% TFA in water; solvent B: 0,1% TFA in 70% CH 3 CN. 
Fig. 17B represents a chromatogram of a tryptic digestion on oxidized 2S under the same conditions as done in 
Fig.17A. The hatched peak was collected and subjected to further purification. 

40 - Fig.18A represents a chromatogram of an aliquot of the synthetic peptide YGGFLK, used as a marker, on a C18 
column. The gradient (dashed line) is isocratic at 0% solvent B between 0 and 5 minutes, and solvent B increases 
to 100% into 70 minutes. Solvent A: 0,1% TFA in water; solvent B: 0,1% TFA in 70% CH 3 CN. 
Fig. 18B represents the rechromatography on the C18 column of the YGGFLK containing peak obtained from 
HPLC on the C4 column (see Fig. 17B). The running conditions are the same as for Fig. 18A. 

45 - Fig. 19 represents the results of the aminoacid sequence determination on YGGFLK. The left corner box shows 
standard of PTH-amino acids (20 pmol each). The signal for cycles 1 to 6 is 8 times more attenuated as the refer- 
ence. 

Fig. 20A represents a chromatogram showing the YGGFL peptide used as marker. This peptide is the result of a 
craboxypeptidase B digestion on the synthetic peptide YGGFLK. The running conditions are the same as in 
50 Fig. 17 A. 

Fig. 20B shows the isolation of the IGGFL peptide, indicated with *, after carboxypeptidase B digestion on the 
YGGFLK peptide, that has been isolated from the plant material. 

Fig. 21 shows diagrammatically the successive phases of the construction of a chimeric 2S albumins Arabidopsis 
thaliania gene including the deletion of practically all parts of the hypervariable region and its replacement by a Accl 
55 site, the insertion of the sequences encoding the GHRF and cleavage sites, given by way of example in the follow- 
ing disclosure, in the Accl site, particularly through site-directed mutagenesis and the cloning of said chimeric gene 
in plant vector suitable for plant transformation. 

Fig. 22A shows the eight oligonucleotides used in the constructions of the GHRFS and GHRFL genes. The limits 
of the oligonucleotides are indicated by vertical lines, and the numbers above and below said oligonucleotides indi- 
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cate their number. In oligonucleotides 4 and 8 the bases enclosed in the box are excluded, resulting in the gene 
encoding GHRFS. The peptide sequence of said GHRFS and GHRFL and the methionine sequences providing the 
CnBr cleavage sites are shown above the DNA sequence. 

Fig. 22B shows the Accl site of the modified AT2S1 gene and the insertion of said GHRF's in said Accl site in such 
5 a way that the open reading frame is maintained. 

Example I : 

As a first example of the method described, a procedure is given for the production of Leu-enkephalin, a pentapep- 
10 tide with opiate activity in the human brain and other neural tissues (Hughes et al., 1975a). A synthetic oligomer encod- 
ing the peptide and specific protease cleavage sites is substituted for part of the hypervariable region in a cDNA clone 
encoding the 2S albumin of Bertholletia excelsa (Brazil nut). This chimeric gene is fused to a fragment containing the 
promoter and signal peptide encoding regions of the soybean lectin gene. Lectin is a 7S albumin seed storage protein 
(Goldberg et al., 1983). The entire construct is transferred to tobacco plants using an Agrobacterium mediated transfor- 
15 mation system. Plants are regenerated, and after flowering the seeds are collected and the 2S albumins purified. The 
enkephalin peptide is cleaved from the 2S albumin using the two specific proteases whose cleavage sites are built into 
the oligonucleotide, and then recovered using HPLC techniques. 

1 . cDNA synthesis and screening. 

20 

Total RNA is isolated from nearly mature seeds of the Brazil nut using the method described by Harris and Dure 
(1981). Poly A+ RNA is then isolated using oligo dT chromatography (Maniatis et al., 1982). cDNA synthesis and clon- 
ing can be done using any of several published methods (Maniatis et al., 1982; Okayama and Berg, 1982; Land et al., 
1981 ; Gubler and Hoffman, 1983). In the present case, the 2S albumin from Brazil nut was sequenced (Ampe et al., 

25 1986), and an oligonucleotide based on the amino acid sequence was constructed. This was used to screen a cDNA 
library made using the method of Maniatis et al. (1982). The resulting clone proved to be too short, and a second library 
was made using the method of Gubler and Hoffman (1983) and screened using the first, shorter cDNA clone. A DNA 
recombinant containing the Brazil nut 2S-albumin sequence was isolated. The latter was further cloned in plasmid pUC 
18. Yanisch-Perron, C, Vieira, J. and Massino, J. (1985) Gene 33, pp. 103-1 19. 

30 The recovered plasmid was designated pBN 2S1 . The derived protein sequence, the DNA sequence, the region to 
be substituted, and the relevant restriction sites are shown in fig. 4. 

The deduced protein sequence (obtained from plasmid pBN2S1) is shown above the DNA sequence, and the pro- 
teolytic processing sites are indicated (in fig. 4). The end of the signal sequence is indicated by a Restriction sites used 
in the construction in figure 6, 7, 10, 1 1 and 12 are indicated. The polylinker of the cloning vector is shown in order to 

35 indicate the Pstl site used in the latter part of the construction. The protein and DNA sequences of the peptide to be 
inserted are shown below the cDNA sequence, as well as the rest of the oligonucleotide to be used in the mutagenesis. 
During the mutagenesis procedure the oligonucleotide shown is hybridized to the opposite strand of the cDNA (see fig- 
ure 10). 

40 2. Construction of a chimeric gene. 

The 2S albumin gene is first fused to the DNA fragment encoding the promotor and signal peptide of the soybean 
lectin gene. The cleavage point of the signal peptide in both lectin and Brazil nut is derived from standard consensus 
sequences (Perlman and Halvorson, 1983). The relevant sequences are shown hereafter as well as in figure 4. 

45 
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The protein and double stranded DNA sequences in the regions of the signal peptide/mature protein sequences 
in the plasmids pLel , pSOYLEA 1 and pBN2S1 are shown in figure 5. The positions and recognition sites of the restric- 
tion sites used in the constructions shown in the drawings are indicated. * indicates the protein cleavage site at the end 
35 of the signal sequence. 

The starting point for the construction is the plasmid pLel (Okamuro et al., 1987), which contains a soybean 
genomic Hindlll fragment. This fragment includes the entire soybean lectin gene, its promotor, and sequences 
upstream of the promoter which may be important for seed specific expression. From this fragment a suitable soybean 
lectin promotor/signal sequence cassette was constructed as shown in fig. 6a. A Ddel site is present at the end of the 

40 sequence encoding the signal sequence (SS), and its cleavage site (C/TCAG) corresponds to the processing site. To 
obtain a useful restriction site at this processing site, a Kpnl-Ddel fragment of the SS sequence (hereafter designated 
as "ss") is isolated from pLE1 and cloned into pLK57 (Botterman, 1986) itself linearized with Kpnl and Bglll. The Ddel 
and Bglll ends are filled in with Klenow DNA Polymerase I. this reconstructs the Bglll site (A/GATCT), whose cleavage 
site now corresponds to the signal sequence processing site (see fig. 6, 7a). The plasmid so-obtained, pSOYLEAl , thus 

45 consists of plasmid pLK57 in which the Kpnl-Ddel fragment of the SS sequence (ss) initially contained in pLE1 is sub- 
stituted for the initial Kpnl-Bglll fragment of pLK57. A Hindlll site is placed in front of this fragment by substituting a Kpnl- 
Pstl fragment containing said Hindlll site from pLK69 (Botterman, 1986) for the Pstl-Kpnl fragment designated by (1) in 
pSoyLeal as shown diagrammatically in fig. 4. this intermediate construction is called pSoyl_ea2. In a second step the 
lectin promoter is reconstructed by inserting the Hind 1 1 1- Kpnl fragment (2) of pLE1 in pSoyl_ea2. As there is another 

so Bglll site present upstream of the promoter fragment, the lectin promoter/signal sequence cassette is now present as 
a Bglll-Bglll fragment in the plasmid pSoyl_ea3. 

This cassette is now fused, in register, with a 205 bp Brazil nut cDNA fragment of plasmid pBN2S1 and containing 
the coding sequences for the Brazil nut pro-2S albumin (i.e., the entire precursor molecule with the exception of the sig- 
nal sequence). This is done as shown in figure 5. The 205bp fragment obtained after digestion of the cDNA clone 

55 pBN2S1 (fig. 4) with Bgll, treatment with Klenow DNA Polymerase I to resect the Bgll protruding ends, and digestion 
with Pstl is cloned into pUC18 (Yannish- Perron et al., 1985) which has been linearized by digestion with Smal and Pstl. 
The resulting plasmid, pUC18-BN1 , is digested with both EcoRI and Aval, both ends filled in, and religated. This results 
in the reconstruction of a new plasmid, designated pUC18-BN2, containing the desired Brazil nut coding sequence with 
an EcoRI site at the beginning (fig. 7). 
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To fuse the Brazil nut coding sequences in register to the lectin promoter/signal sequence cassette, pUC18-BN2 is 
digested with EcoRI and the ends partially filled in using Klenow enzyme in the presence of dATP alone. The remaining 
overhanging nucleotides are removed with S1 nuclease, after which a Pstl digest is carried out. This yields a fragment 
with one blunt end and one Pstl digested end. The lectin promoter/signal sequence fragment is taken from pSoyLeal 

5 (fig. 7) as an EcoRI-Bglll fragment with filled in Bglll ends. The two fragments are ligated together with Pstl-EcoRI 
digested pUC18. This results in pUC18SLBN1, with a reconstructed Bglll site at the junction of the signal peptide 
encoding sequence and the Brazil nut sequences (fig. 7). pUC18SLBN1 thus consists of the pUC18 plasmid in which 
there have been inserted the Bglll-EcoRI fragment (shown by (3) on fig. 6) of pSoyLeal and, upstream thereof in the 
direction of transcription the EcoRI-Pst-EcoRI fragment supplied by pUC18BN2 and containing the 205 bp cDNA cod- 

10 ing sequence for the Brazil nut pro-2S albumin. 

However, the reading frame is not properly maintained. In order to correct this, the plasmid is linearized with Bglll, 
treated with S1 nuclease, and religated. This intermediate is designated pUC18SLBN2. The construction is finally com- 
pleted in two steps by inserting the Kpnl fragment carrying the 5' part of the promoter from pSoyl_ea3, yielding 
pUC1 8SLBN3, and inserting into the latter the Pstl fragment containing the 3' part of the Brazil nut cDNA from pBN2S1 . 

is The resulting final construction, pUCSLBN4, contains the lectin promoter/signal sequence - Brazil nut cDNA sequence 
fusion contained within a BamHI fragment. 

3. Substitution of part of the hypervariable region with sequences encoding enkephalin and protease cleavage sites. 

20 The Leu-enkephalin peptide has the sequence Tyr-Gly-Gly-Phe-Leu (Hughes et al., 1975b). In order to be able to 
recover the intact polypeptide from the hybrid 2S albumin after purification, codons encoding Lysine are placed on 
either side of the enkephalin coding sequences. This allows the subsequent cleavage of the enkephalin polypeptide 
from the 2S albumin with the endopeptidases endolysin-C and carboxypeptidase B in the downstream processing 
steps. Finally, in order for the oligonucleotide to be capable of hybridizing to the gapped duplex molecule during muta- 

25 genesis (see below), extra sequences complementary to the Brazil nut sequences to be retained are included. The 
exact sequence of the oligonucleotide, determined after the study of codon usage in several plant storage protein 
genes, is 

5'-GCAACAGGAGAAGTACGGTGGATTCTTGAAGCAGATGCG-3\ 
The substitution of part of the sequence encoding the hypervariable region of the Brazil nut 2S albumin is done 
30 using site-directed mutagenesis with the oligonucleotide as primer (figs. 4 and 10). The system of Stanssens et al. 
(1987) is used. 

The Stanssens et al method is illustrated in fig. 9, and recalled hereinafter. It makes use of plasmid pMac5-8 whose 
restriction and genetic map is shown in fig. 8 and whose main features are also recalled hereinafter. 

The positions of the relevant genetic loci of pMac5-8 are indicated in fig. 8. The arrows denote their functional ori- 
35 entation. fdT: central transcription terminator of phage fd; F1-ORI: origin of replication of filamentous phage f1 ; ORI: 
ColE1-type origin of replication; BLA/Ap R : region coding for p-lactamase; CAT/Cm R : region coding for chloramphenicol 
acetyl transferase. The positions of the amber mutations present in pMc5-8 (the bla-am gene does not contain the Seal 
site) and pMc5-8 ( cat-am ; the mutation eliminates the unique Pvull site) are indicated. Suppression of the cat amber 
mutation in both supE and supF hosts results in resistance to at least 25 jug/ml Cm. pMc5-8 confers resistance to ±20 
40 jutg/ml and 100 jug/ml Ap upon amber-suppression in supE and supF strains respectively. The EcoRI, Ball and Ncol sites 
present in the wild-type cat gene (indicated with an asterisk) have been removed using mutagenesis techniques. 

The principle of the Stanssens method as also applied to the substitution of the Leu-enkephalin peptide for the 
selected hypervariable region of 2S-albumin region here examplified, as described hereafter, is also first recalled here- 
after: 

45 Essentially the mutagenesis round used for the above mentioned substitution is ran as follows. Reference is 

made to fig. 9, in which the amber mutations in the Ap and Cm selectable markers are shown by closed circles. The 
symbol represents the mutagenic oligonucleotide. The mutation itself is indicated by an arrowhead. 

The individual steps of the process are as follows: 

so - Cloning of the target DNA fragment into pMa5-8 (I). This vector carries on amber mutation in the Cm R gene and 
specifies resistance to ampicillin. 

Preparation of single stranded DNA of this recombinant (II) from pseudoviral particles. 

Preparation of a restriction fragment from the complementary pMc-type plasmid (III). pMc-type vectors contain the 
wild-type Cm R gene while an amber mutation is incorporated in the Ap resistance marker. 
55 - Construction of gap duplex DNA (hereinafter called gdDNA) gdDNA (IV) by in vitro DNA/DNA hybridization. In the 
gdDNA the target sequences are exposed as single stranded DNA. Preparative purification of the gdDNA from the 
other components of the hybridization mixture is not necessary. 
Annealing of the synthetic oligonucleotide to the gdDNA (V). 
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Filling in the remaining gaps and sealing of the nicks by a simultaneous in vitro DNA polymerase/DNA ligase reac- 
tion (VI). 

Transformation of a mutS host, i.e., a strain deficient in mismatch repair, selecting for Cm resistance. This results 
in production of a mixed plasmid progeny (VII). 
5 - Elimination of progeny deriving from the template strand (pMa-type) by retransformation of a host unable to sup- 
press amber mutations (VIII). Selection for Cm resistance results in enrichment of the progeny derived from the 
gapped strand, i.e., the strand into which the mutagenic oligonucleotide has been incorporated. 
Screening of the clones resulting from the retransformation for the presence of the desired mutation. 

10 In the mutagenesis experiment, depicted in figure 9, Cm resistance is used as an indirect selection for the synthetic 
marker. Obviously, an experiment can be set up such that the Ap selectable marker is exploited. In the latter case the 
single stranded template (II) and the fragment (III) are the pMc- and pMa-type, respectively. A single mutagenesis step 
not only results in introduction of the desired mutation but also in conversion of the plasmid from pMa-type to pMc-type 
or vice versa. Thus, cycling between these two configurations (involving alternate selection for resistance to ampicillin 

15 or chloramphenicol) can be used to construct multiple mutations in a target sequence in the course of consecutive 
mutagenesis rounds. 

Reverting now to the present example relative to the substitution of part of the sequence encoding the hypervaria- 
ble region of the Brazil nut 2S albumin, the Stanssens et al system is thus applied as follows: 

The Pstl-EcoRI fragment of the chimeric gene containing the region of interest (see figs. 10, 1 1 and also fig.4) is 

20 inserted in a pMa vector which carries an intact beta-lactamase gene and a chloramphenicol acetyltransferase gene 
with an amber mutation fig. 10, so that the starting plasmid confers only ampicillin resistance but not chloramphenicol 
resistance. Single stranded DNA (representing the opposite strand to that shown in figure 4) is prepared and annealed 
with the EcoRI-Pstl linearized form of a pMc type plasmid, yielding a gapped duplex molecule. The oligonucleotide is 
annealed to this gapped duplex. The single stranded gaps are filled with Klenow DNA polymerase I, ligated, and the 

25 mixture transformed into the appropriate host. Clones carrying the desired mutation will be ampicillin sensitive but chlo- 
ramphenicol resistant. Transformants resistant to chloramphenicol are selected and analyzed by DNA sequencing. 
Finally, the hybrid gene fragment is inserted back into the lectin/Brazil nut chimera by replacement of the Pstl-Ncol frag- 
ment in pUC18SLBN4 with the mutagenised one from pMC58BN (fig. 11). The resulting plasmid, pUC18SLBN5, con- 
tains the lectin promoter and signal sequence fused to a hybrid Brazil nut-enkephalin gene, all as a BamHI fragment. 

30 

4. Transformation of tobacco plants. 

The BamHI fragment containing the chimeric gene is inserted into the BamHI site of the binary vector pGSC1702 
(fig. 12). This vector contains functions for selection and stability in both E. coli and A. tumefaciens, as well as a T-DNA 

35 fragment for the transfer of foreign DNA into plant genomes (Deblaere et al., 1987). The latter consists of the terminal 
repeat sequences of the octopine T-region. The BamHI site into which the fragment is cloned is situated in front of the 
polyadenylation signal of the T-DNA gene 7. A chimeric gene consisting of the nopaline synthase (nos) promoter, the 
neomycin phosphotransferase protein coding region (neo) and the 3' end of the OCS gene is present, so that trans- 
formed plants are rendered kanamycin resistant. Using standard procedures (Deblaere et al., 1987), the plasmid is 

40 transferred to the Agrobacterium strain C58C1 Rif carrying the plasmid pGV2260. The latter provides in trans the vir 
gene functions required for successful transfer of the T-DNA region to the plant genome. This Agrobacterium is then 
used to transform tobacco plants of the strain SR1 using standard procedures (Deblaere etal., 1987). Calli are selected 
on 100 juig/ml kanamycin, and resistant calli used to regenerate plants. DNA prepared from these plants is checked for 
the presence of the hybrid gene by hybridization with the Brazil nut 2S albumin cDNA clone or the oligonucleotide. Pos- 

45 itive plants are grown and processed as described below. 

5. Purification of 2S albumins from seeds. 

Positive plants are grown to seed, which takes about 15 weeks. Seeds of individual plants are harvested and 
so homogenized in dry ice, and extracted with hexane. The remaining residue is taken up in Laemmli sample buffer, boiled, 
and put on an SDS polyacrylamide gel (Laemli, 1970). Separated proteins are electroblotted onto nitrocellulose sheets 
(Towbin et al., 1979) and assayed with a commercially available polyclonal antibody of the Leu-enkephalin antigen 
(UCB cat. £ i72/001, ib72/002). 

Using the immunological assays above, strongly positive plants are selected. They are then grown in larger quan- 
55 tities and seeds harvested. A hexane powder is prepared and extracted with high salt buffer (0.5M NaCI, 0.05 M Na- 
phosphate pH 7.2). This extract is then dialysed against water, clarified by centrifugation (50,000xg for 30 min), and the 
supernatant further purified by gel filtration over a Sephadex G-75 column run in the same high salt buffer. The proteins 
are further purified from non-ionic, non protein material ion exchange chromatogrpahy on a DEAE-Cellulose column. 
Fractions containing the 2S protein mixture are then combined, dialysed against 0.5 % NH 4 HC0 3 , and lyophilised. 
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6. Recovery of Leu-enkephalin. 

The mixture of purified endogenous 2S storage proteins and hybrid 2S proteins are digested with endo-Lys-C. In 
order to ensure efficient proteolytic degradation, the 2S proteins are first oxidized with performic acid (Hirs, 1956). The 

5 oxidation step opens the disulfide bridges and denatures the protein. Since Leu-enkephalin does not contain amino 
acid residues which may react with performic acid, the opiate will not be changed by this treatment. Endo-Lys-C 
digested is carried out in an 0.5 % NH 4 HC0 3 solution for 12 hours at 37°C and terminated by lyophilization. This diges- 
tion liberates the Leu-enkephalin, but still attached to the C terminal Lysine residue. Since the hybrid protein contains 
very few other lysine residues, the number of endo-Lys-C peptides is very small, simplifying further purification of the 

10 peptide. The enkephalin-Lys peptides are purified by HPLC reversed phase chromatography using a C18 column (e.g., 
that commercialized under the trademard VYDAC). The gradient consists of 0.1 % trifluoroacetic acid as initial solvent 
(A) and 70 % acetonitrile in 0.1 % triflouroacetic acid as diluter solvent (B). A gradient of 1 .5 % solvent B in A per minute 
is used under the conditions disclosed by Ampe et al., (1987). The purified enkephalin-Lys peptide is identified by amino 
acid analysis and/or by immunological techniques. It is further treated by carboxypeptidase B as disclosed by Ambler, 

15 (1 972) in order to remove the carboxyl terminal Lysine residue. Finally, the separation and purification of the opiod pep- 
tide is finally achieved by reversed phase HPLC chromatography according to the method disclosed by Lewis et al., 
(1979). 

Other methods aravailable, as illustrated in Example II. 

20 7. Assay of Leu-enkephalin biological activity. 

Enkephalins inhibit [ 3 H]-naloxone binding in sodium-free homogenates of guinea pig brain. Opiod acivity can be 
assayed as the ability to inhibit specific [ 3 H]-naloxone binding to rat brain membranes (Pasternak et al., 1975) as previ- 
ously described (Simantov et al., 1976). One unit of opiod activity "enkephalin" was defined as that amount that yields 
25 50 % occupancy in a 200 julI assay (Colquhaun et al., 1973). 

Example II: 

As a demonstration of the flexibility of the technique, a procedure for the production of Leu-enkephalin using a dif- 
30 ferent 2S albumin is given. In this case, instead of using a cDNA clone from Bertholletia excelsa as basis for the con- 
struction, a genomic clone isolated from Arabisopsis thaliana is used. Since a genomic clone is used the gene's own 
promoter is used, simplifying the construction considerably. To further demonstrate the generality of the technique, the 
altered 2S albumin gene is brought to expression in three different plants: tobacco, Arabidopsis and Brassica napis . a 
relative of Arabidopsis which also has a 2S albumin (see introduction). Many of the details of this example are similar 
35 to the previous one and are thus described more briefly. 

1 . Cloning of the Arabidopsis thaliana 2S albumin gene. 

Given the ease of purification of 2S albumin (see introduction, example 1), the most straightforward way to clone 

40 the Arabidopsis 2S albumin gene is to construct oligonucleotide probes based on the protein sequence. The protein 
sequence was determined by standard techniques, essentially in the same way as that of the Brazil nut 2S albumin 
(Ampeetal., 1986). Figure 1 3 shows the sequence of the 1 kb Hindlll fragment containing the Arabidopsis thaliana 2S 
albumin gene. The deduced protein sequence is shown above the DNA sequence, and proteolytic processing sites are 
indicated. The end of the signal sequence is indicated by a , and SSU indicates small subunit. The protein and DNA 

45 sequences of the peptide to be inserted are shown below the cDNA sequence, as well as the rest of the oligonucleotide 
to be used in the mutagenesis. During the mutagenesis procedure the oligonucleotide shown is hybridized to the oppo- 
site strand of the DNA sequence shown. The Nde I site used to check the orientation of the Hindlll fragment during the 
construction is underlined (bp-1 1 7). The numbering system is such that the A of initiation codon is taken as base pair 1 . 
The difficulty in using oligonucleotide probes is that more than one codon can encode an amino acid, so that unam- 

so biguous determination of the DNA sequence is not possible from the protein sequence. Hence the base inosine was 
used at ambiguous positions. The structure of inosine is such that while it does not increase the strength of a hybridi- 
zation, it does not decrease it either (Ohtsuka et al., 1985; Takahashi et al., 1985). On this basis, three oligonucleotide 
probes were designed as shown in figure 14. The protein sequence of the large sub-unit of the 2S albumin of Arabidop- 
sis thaliana . Under the protein sequence are the sequences of the oligonucleotides used as hybridization probes to 

55 clone the gene. I designates Inosine. 

The three oligonucleotides were used to screen a genomic library of Arabidopsis DNA constructed in the phage 
Charon 35 (Loenen and Blattner, 1983) using standard methods (Maniatis et al., 1982; Benton and Davis, 1977). The 
oligonucleotides were kinased (Miller and Barnes, 1986), and hybridizations were done in 5X SSPE (Maniatis et al., 
1 982), 0.1 % SDS, 0.02 % Ficoll, 0.02 % Polyvinylpyrolidine, and 50 jmg/ml sonicated herring sperm DNA at 45°C. Filters 
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were washed in 5X SSPE, 0.1 % SDS at 45 degrees for 4-8 minutes. Using these conditions, a clone was isolated which 
hybridized with all three oligonucleotide probes. Appropriate regions were subcloned into pUC18 (Yanisch- Perron et al., 
1985) using standard techniques (Maniatis et al., 1982) and sequenced using the methodology of Maxam and Gilbert 
(1980). The sequence of the region containing the gene is shown in figure 13. 

5 

2. Substitution of part of the hypervariable region with sequences encoding enkephalin and protease cleavage sites. 

The gene isolated above was used directly for construction of a Leu-enkephalin/2S albumin chimera. As in the first 
example, an oligo was designed incorporating the Leu-enkephalin sequence and lysine encoding codons on either side 

10 of it, in order to be able to recover the enkephalin polypeptide in the downstream processing steps, and extra 
sequences complementary to the flanking Arabidopsis sequences in order for the oligonucleotide to be able to hybridize 
to the gapped duplex molecule during the mutagenesis. The resultant oligonucleotide has the sequence: 
5'-CAAGCTGCCAAGTACGGTGGATTCTTGAAGCAGCACCAAC- 3' 
its position in the sequence is shown in figure 8. 

15 The region containing the gene and sufficient flanking regions to include all necessary regulatory signals is con- 
tained on a 3.6 kb Bglll fragment, inserted in the cloning vector pJB65 (Botterman et al., 1987). The clone is called 
pAT2S1 Bg. The region to be mutagenized is contained on 1 kb Hind III fragment within the 3.6 kb Bglll fragment, and 
this smaller fragment is inserted into the Hindlll site of the pMa5-8 vector of Stanssens et al., (1987) (fig. 5c). The ori- 
entation is checked using an asymmetric Ndel site (figure 8). The mutagenesis is carried out using exactly the strategy 

20 described in step 3 of example 1 . Subsequently the hybrid gene is reinserted into the larger fragment with the muta- 
genized one using standard techniques (Maniatis et al., 1982). The orientation is again checked using the Ndel site. 

3. Transformation of plants. 

25 The Bglll fragment containing the hybrid gene and sufficient flanking sequences both 5' and 3' to the coding region 
to insure that appropriate signals for gene regulation are present is inserted into the BamHI site of the same binary vec- 
tor, pGSC1702, used in example 1 (figure 12). This vector is described in section 4 of example 1. Transformation of 
tobacco plants is done exactly as described there. The techniques for transformation of Arabidopsis thaliana and 
Brassica napus are such that exactly the same construction, in the same vector, can be used. After mobilization to 

30 Aarobacterium tumefaciens as described in section 4 of example 1 , the procedures of Llyod et al., (1 986) and Klimasze- 
wska et al. (1985) are used for transformation of Arabidopsis and Brassica respectively. In each case, as for tobacco, 
calli can be selected on 100 jutg/ml kanamycin, and resistant calli used to regenerate plants. DNA prepared from such 
plants is checked for the presence of the hybrid gene by hybridization with the oligonucleotide used in the mutagenesis 
(In the case of tobacco and Brassica . larger portions of the hybrid construct could be used, but in the case of the Ara- 

35 bidopsis these would hybridize with the endogenous gene.). 

In the embodiment of the invention, Bglll fragment containing the hybrid gene and sufficient flanking sequences 
both 5' and 3' to the coding region to insure that appropriate signals for gene regulation are present is inserted into the 
Bglll site of the binary vectors pGSC1 703 (Fig. 1 5) or pGSC1 703A (Fig. 1 6). pGSC1 703 contains functions for selection 
in both E. cpji and Agrobacterium ,as well as the T-DNA fragments allowing the transfer of foreign DNA into plant 

40 genomes (Deblaere et al., 1987) It further contains the bidirectional promotor TR (Velten et al., 1984) with the neomy- 
cine phosphotransferase protein coding region (NPTII) and the 3' end of the ocs gene. It do not contain a gene encoding 
ampicillin resistance, as pGSC1702 does, so that carbenicillin as well as claforan can be used to kill the Agrobacteria 
after the infection step. Vector pGSC1703A contains the same functions as vector pGSC1703, with an additional gene 
encoding hygromycine transferase. This allows the selection of the transformants on both kanamycin as hygromycine. 

45 Transformation of tobacco plants is done exactly as described in section 4 of Example I, whereby the hybrid gene is 
inserted into the plant transformation vector pGSC1703. Transformation of Arabidopsis thaliana and Brassica napus 
were done with pGSC1703A in which the hybrid AT2S1 gene has been inserted. After mobilization to Agrobacterium 
tumefaciens C58C1Rif carrying the plasmid pMP90 (Koncz and Schell, 1986), which latter provides in trans and vir 
gene functions but which do not carry a gene encoding ampicillin resistance, the procedures of Lloyd et al., (1986) and 

50 Klimaszewska et al. (1985) are used for transformation of Arabidopsis and Brassica respectively. Carbenicillin is used 
to kill the Agrobacterium after co-cultivation occured. In each case, as for tobacco, calli can be selected on 100 jutg/ml 
kanamycin, and resistant calli used to regenerate plants. DNA prepared from such plants is checked for the presence 
of the hybrid gene by hybridization with the oligonucleotide used in the mutagenesis. (In the case of tobacco, larger por- 
tions of the hybrid construct could be used, but in the case of Brassica and Arabidopsis these would hybridize with the 

55 endogenous gene.) 
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4. Purification of 2S albumins from seeds and further processing 

Positive plants from each species are grown to seed. In the case of tobacco this takes about 15 weeks, while for 
Arabidopsis and Brassica approximately 6 weeks and 3 months respectively are required. Use of different varieties may 
5 alter these periods. Purification of 2S albumins from seeds, recovery of the Leu-enkephalin, and assaying the latter for 
biological activity are done as follows. 

Methods used for the isolation of Enkephalin from Arabidopsis seeds 

10 Two methods were used to isolate Enkephalin from Arabidopsis seeds. First, a small amount of seeds isolated from 
several individual transformants was screened for the presence of chimeric 2S albumins. This is done because, as 
described by Jones et al., (1985), expression of introduced genes may vary widely between individual transformants. 
Seeds from individual plants seen by this preliminary screening were then used to isolate larger amounts and deter- 
mine yields more accurately. Both procedures are described below. 

15 

A) Fast screening procedure for Enkephalin-containing 2S proteins 

Seeds of individual plants (approximately 50 mg) were collected and ground in an Eppendorf tube with a small plas- 
tic grinder shaped to fit the tube. No dry ice is used in this procedure. The resulting paste was extracted three times with 

20 1 ml of heptane and the remaining residue dried. The powder was suspended in 0.2 ml of 1 M NaCI and centrifuged for 
5 min in an Eppendorf centrifuge. This extraction was repeated three times and the supernatants combined, giving a 
total volume of approximately 0.5ml. This solution was diluted 20 fold with water, giving a final NaCI concentration of 
0.05M. This was stored overnight at 4°C and then spun at 5000 rpm in a Sorvall SS-34 rotor for 40 min. The resulting 
supernatant was passed over a disposable C18 cartridge (SEP-PAC, Millipore, Milford, Massachusetts, U.S.A.). The 

25 cartridges were loaded by injecting the 10ml supernatant with a syringe through the columns at a rate of 5 ml/min. The 
cartridge was then washed with 2 ml of 0.1% TFA and proteins were desorbed by a step elution with 2 ml portions of a 
0.1% TFA solution containing 7%, 14%, 21% etc. up to 70% acetonitrile. The fractions eluting in the range from 28% to 
49% acetonitrile are enriched for 2S albumins as judged by SDS-polyacrylamide gel analysis performed on aliquots 
taken from the different fractions. The 2S albumin-containing fractions were combined and dried in a Speed Vac con- 

30 centrator (Savant Instruments). 

The combined fractions were reconstituted in 0.95 ml 0.1% TFA in water, filtered through an HV-4 Millex filter (Mil- 
lipore), and applied to a reversed phase C 4 column 25 cm in length and 0.46 cm in diameter (Vydac 214TP54, pore size 
300 angstrom, particle size 5 jam). The HPLC equipment consisted of 2 pumps (model 510), a gradient controller 
(model 680) and an LC spectrophotometer detector (Lambda-Max model 481 , all from Waters, Milford, Massachusetts, 

35 U.S.A.). The gradients were run as follows: Solution A was 0.1% TFA in H 2 0, solution B 0.1% TFA in 70% CH 3 CN. For 
5 minutes, a solution of 0% B, 100% A was run over the column, after which the concentration of B was raised to 100% 
in a linear fashion over 70 minutes. The column eluate was detected by absorbance at 214 nm. The fractions containing 
2S albumins were collected and dried in a Speed Vac concentrator. 

In order to obtain a more complete digestion with proteases it is recommended that the proteins be denatured by 

40 oxidizing the disulfide bridges with performic acid. This is done by adding 0.5 ml of a solution made by mixing 9 ml of 
formic acid and 1 ml of 30% H 2 0 2 at room temperature. The solution was made 2 hours before use. The reaction is 
allowed to proceed for 30 min at 0°C and terminated by drying in a Speed Vac concentrator. Traces of remaining per- 
formic acid were removed by twice adding 500 juil of water and lyophilizing the sample. 

The residue was redissolved in 0.75 ml of 0.1 M Tris-HCI pH 8.5 after which 4 jug of TPCK-treated trypsin (Worthing- 

45 ton) was added. The reaction was placed at 37°C for 3 hours, after which it was terminated by the addition of 1 0 jutl of 
TFA and stored at -20°C prior to analysis. The resulting peptide mixture is separated by HPLC using the columns and 
gradient mixtures described above. As a standard, a peptide of the same sequence as that expected (YGGFLK) was 
synthesized using standard techniques on a Biolynx 41 75 peptide synthesizer (LKB). This peptide was run over the col- 
umn and the retention time determined. The mixture of peptides resulting from the trypsin digest was then loaded on 

so the same column and peptides with the same retention time as the standard were collected, dried, and reloaded on a 
C1 8-reversed phase column. The elution time of the marker peptide again served as a reference for the correct position 
of the enkephalin containing peptide. The identity of this peptide was confirmed by amino acid sequencing, which also 
allowed a rough quantitation. Four plants of the six transformants analyzed were shown to contain significant quantities 
of Leu-enkephalin. By way of example the detailed analysis and processing steps are given below for one of these said 

55 four plants. 
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B) Larger scale isolation and processing of Enkephalin from Arabidopsis seeds Grinding and initial extraction 

2.1 1 g of seeds from said plant were ground in a mortar in dry ice. Lipids were removed from the resulting powder 
by extracting three times with 5 ml of heptane. The resulting residue was dried. 

5 

Protein extraction 

The powder was dissolved in approximately 4 ml of 1 .0M Nacl. The resulting paste was spun in an SS-34 rotor at 
17,500 rpm for 40 min. After each spin the supernatant was transferred to a fresh tube and the pellet again resus- 
10 pended in 4 ml of 1.0M NaCI. This procedure was repeated three times. The three supernatants (12 ml total) were 
passed through a 0.45 jum filter (HA, Millipore). 

Isolation of 2S albumins via ael filtration 

15 The 12 ml of solution from the previous step was passed over a Sephadex G-50 medium (Pharmacia) column in 
two batches of 6ml. The column was 2.5 cm in diameter, 100 cm in length, and run at a flow rate of approximately 27 
ml/hr in 0.5M NaCI. Fractions of approximately 7 ml were collected. The fractions were monitored for the protein in two 
ways. First, total protein was detected by applying 10 jutl of each fraction on a piece of Whatman 3MM paper, indicating 
the fraction numbers with a pencil. The spots are dried for 1 min in warm air and the proteins fixed by a quick (30 sec) 

20 immersion of the paper sheet in a 10% TCA solution. The sheet is then transferred to a Commassie Blue solution sim- 
ilar to that used for polyacrylamide gel staining. After 1 min, the paper is removed and rinsed with tap water. Protein 
containing fractions show a blue spot on a white background. The minimum detection limit of the technique is about 
0.05 mg/ml. Those fractions containing protein were assayed for the presence of 2S albumins by adding 2 jutl of the 7 
ml fraction to 1 0 julI of sample buffer and then loading 6 julI of this mixture on a 1 7.5% polyacrylamide minigel. Those f rac- 

25 tions shown to contain 2S albumins were pooled; the total volume of the pooled fractions was 175 ml. 

Desalting of the isolated 2S albumins 

This was done via HPLC over a C 4 column 25 cm in length and 0.46 cm in diameter (Vydac 214TP54, pore size 
30 300 angstrom, particle size 5 jam). The HPLC equipment consisted of 2 pumps (model 510), a gradient controller 
(model 680) and an LC spectrophotometer detector (Lambda-Max model 481 , all from Waters, Milford, Massachusetts, 
U.S.A.). 21 ml of the 175 ml were loaded on this system in 6 runs of 3.5 ml each. The gradients were run as follows: 
Solution A was 0.1% TFA in H 2 0, solution B 0.1% TFA in 70% CH 3 CN. For 5 minutes, a solution of 0% B, 100% A was 
run over the column, after which the concentration of B was raised to 100% in a linear fashion over 70 minutes. During 
35 each run the 2S albumin fraction was collected, and after all 6 runs these fractions pooled and divided into 3 tubes, each 
of which therefore contained 7/1 75 of the 2S albumins from the 2.1 1 g seeds. Each of the aliquots was processed fur- 
ther separately and used for quantitative estimation of yields. 

Trypsin Digest 

40 

Prior to digestion with trypsin the three aliquots were oxidized as described above. The trypsin digest was carried 
out essentially as described above. 0.95 ml of 0.1 M Tris-HCI pH 8.5 was added to each aliquot, which was supple- 
mented with 50 juig of trypsin (Worthington) and the reaction allowed to proceed for 4 hr at 37°C. 

45 Isolation of the YGGFLK peptide 

The enkephalin peptide containing the carboxyl terminal lysine residue was isolated using two sequential HPLC 
steps. As described in the small scale isolation procedure above, a peptide of the same sequence as that expected was 
synthesized and run over an HPLC system using the same column and gradient conditions described in the desalting 

so step above. The retention time of the synthetic peptide was determined (Fig. 1 7A). The three trypsin digests were then 
(separately) loaded on the same column and the material with the same retention time as that of the synthetic peptide 
collected (the hatched area in Fig. 17B) and dried. The same procedure was then followed using the same equipment 
and gradients except that a C18 column (25 x 0.46 cm, Vydac 218TP104 material of pore size 300 angstrom and par- 
ticle size 10 |mm) was used. Again material with the same retention time as the synthetic peptide was collected (Fig. 1 8A 

55 and 1 8B). This resulted in three preparations each derived from 7/1 75 of the total 2S albumin. 

1/20 of the material in one of these three aliquots was used to check the sequence of the isolated peptide. This was 
determined by automated gas-phase sequencing using an Applied Biosystems Inc. (U.S.A.) 470A gas-phase sequen- 
ator. The stepwise liberated phenylthiohydantoin (PTH) amino acid derivatives were analyzed by an on-line PTH-amino 
acid analyzer (Applied Biosystems Inc. 120A). The sequenator and PTH-analyzer were operated according to the man- 
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ufacturer's instructions. The HPLC-chromatograms of the liberated PTH-amino acids from cycles 1 through 6 are shown 
in figure 19. The sequence was as expected YGGFLK. The yield of PTH-amino acid of the first cycle was used for cal- 
culate the yield of this intermediate peptide (251-277 nmol/gr seed). 

5 Removal of the extra Lysine from the enkephalin 

The three aliquots resulting from the previous step were resuspended in 100 julI of 0.2M N-Ethylmorpholine pH 8.5 
(Janssen Chimica, Belgium) and one third of each treated with 0.2 jmg of carboxypeptidase B (Boehringer Mannheim, 
sequencing grade) at 37°C. The three aliquots were treated for 5, 12, and 17 minutes respectively, but all three digests 

10 proved to be equally effective. After digestion the enkephalin was purified by HPLC using the same equipment, column, 
and gradients as described under desalting above. 

The final yield of enkephalin was determined by doing an amino acid analysis. An aliquot representing 1/150 of the 
total amount of the above mentioned three aliquots was hydrolyzed in 400 jnl of 6N HCI, 0.05% phenol at 1 10°C for 24 
h. The hydrolysate was dried and amino acids derivitised into phenylthiocarbamoyl (PTC) residues (Bildingmeyer et al., 

is 1984). Three separate aliquots of the PTC residue mixture were quantified using the PICO-TAG amino acid analysis 
system (Waters, Millipore, Milford, Massachusetts, U.S.A.). Yields of enkephalin peptide were calculated for each of the 
three samples using alpha amino-butyric acid as an internal standard. Based on an average of the three determinations 
a final yield of 206 nmol enkephalin/g seed was calculated. 

The identity of the peptide finally obtained was verified in three ways. First, its amino acid composition, which 

20 showed molar ratios of Gly, 1.76; Tyr, 1.00; Leu, 1.15 and Phe, 102. Secondly, its retention time on a reversed phase 
HPLC column match that of a reference enkenephalin peptide (fig. 20) and finally its amino acid sequence was deter- 
mined. These criteria unambiguously identify the peptide isolated from chimeric 2S albumins as being Leu-enkephalin. 

Example III : 

25 

As a third example of the method described, a procedure is given for the production of two growth hormone releas- 
ing factor (GHRF) analogs. Synthetic and natural analogs of the originally isolated 44 amino acid peptide (Guillemin et 
al., 1982) in which the methionine at position 27 has been replaced by a leucine and in which the carboxyl terminus is 
modified in various ways or even shortened by four amino acids have been shown to be active (Kempe et al., 1986; 

30 Rivier et al., 1982). In this case two different analogs, designated hereafter as GHRFL and GHRFS, are produced. Both 
cases incorporate the substitution of leucine for methionine at position 27. GHRFL is produced in such a way that the 
carboxyl terminus is Leu-NH 2 ,as is found in a natural form of the peptide (Guillemin et al., 1982). GHRFS ends in Arg- 
Hse-NH 2> where Hse stands for homoserine. This analog was shown to be biologically active by Kempe et al. (1986). 
Both analogs are flanked by methionine codons in the 2S albumin so that they can be cleaved out by treatment with 

35 CnBr. This is possible as neither analog contains an internal methionine. After isolation of the two peptides using HPLC 
techniques they are chemically modified to result in the Leu-NH 2 and Arg-Hse-NH 2 carboxyl termini. 

A set of synthetic oligonucleotides encoding the two GHRF analogs and CnBr cleavage sites are substituted of 
essentially the entire hypervariable region in a genomic clone encoding the 2S albumin of Arabidopsis thaliana . Only a 
few amino acids adjacent to the sixth and seventh cysteine residues remained. This chimeric gene is under the control 

40 of its natural promoter and signal peptide. The process and constructions are diagrammatically illustrated in Fig. 21 and 
22. The entire construct is transferred to tobacco, Arabidopsis thaliana and Brassica napus plants using an Agrobacte- 
rium mediated transformation system. Plants are regenerated, and after flowering the seeds are collected and the 2S 
albumins purified. The GHRF peptides are cleaved from the 2S albumin using the CnBr which cleavage site is built into 
the oligonucleotide, and then recovered using HPLC techniques. 

45 

Cloning of the Arabidopsis thaliana 2S albumin gene 

The Arabidopsis thaliana gene has been cloned according to what is described in Example II (see also Krebbers 
et al., 1 988). As already of record, the plasmid containing said gene is called pAT2S1 . The sequence of the region con- 
so taining the gene, which is called AT2S1 , is shown in figure 13. 
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2. Deletion of the hypervariable region of AT2S1 gene and replacement by an Accl site 
Part of the hypervariable region of AT2S1 is replaced by the following oligonucleotide: 

5 

5 1 - CCA ACC TTG AAA GGTATACAC TTG CCC AAC - 3 1 3 0-mer 



PTLKGIHLPN 

10 

in which the underlined sequences represent the Accl site and the surrounding ones sequences complementary to the 
coding sequence of the hypervariable region of the Arabidopsis 2S albumin gene to be retained. This results finally in 
the amino acid sequence indicated under the oligonucleotide. 
15 The deletion and substitution of part of the sequence encoding the hypervariable region of AT2S1 is done using site 
directed mutagenesis with the oligonucleotide as primer. The system of Stanssens et al. (1987) is used as described in 
example I 

The individual steps of the process are as follows: 

20 - Cloning of the Hindll I fragment of pAT2S1 containing the coding region of the AT2S1 gene into pMa5-8 (I). This vec- 
tor carries on amber mutation in the Cm R gene and specifies resistance to ampicillin. The resulting plasmid is des- 
ignated pMacAT2S1 (see figure 21 step 1). 

Preparation of single stranded DNA of this recombinant (II) from pseudoviral particles. 

Preparation of a Hindlll restriction fragment from the complementary pMc type plasmid (III). pMc-type vectors con- 
25 tain the wild type Cm R gene while an amber mutation is incorporated in the Ap resistance marker. 

Construction of gap duplex DNA (hereinafter called gdDNA) gdDNA (IV) by in vitro DNA/DNA hybridization. In the 

gdDNA the target sequences are exposed as single stranded DNA. Preoperative purification of the gdDNA from the 

other components of the hybridization mixture is not necessary. 

Annealing of the 30-mer synthetic oligonucleotide to the gdDNA (V). 
30 - Filling in the remaining single stranded gaps and sealing of the nicks by a simultaneous in vitro Klenow DNA 

polymerase l/DNA ligasereaction (VI). 

Transformation of a mutS host, i.e., a strain deficient in mismatch repair, selecting for Cm resistance. This results 
in production of a mixed plasmid progeny (VII). 

Elimination of progeny deriving from the template strand (pMa-type) by retransformation of a host unable to sup- 
35 press amber mutations (VIII). Selection for Cm resistance results in enrichment of the progeny derived from the 
gapped strand, i.e., the strand into which the mutagenic oligonucleotide has been incorporated. 
Screening of the clones resulting from the retransformation for the presence of the desired mutation. The resulting 
plasmid containing the deleted hypervariable region of AT2S1 is called pMacAT2S1C40 (see figure 21 step 2). 

40 3. Insertion of sequences encoding GHRF into the AT2S1 gene whose sequences encoding the hypervariable region 
have been deleted 

As stated above when the sequences encoding most of the hypervariable loop were removed an Accl site was 
inserted in its place. The sequences of interest will be inserted into this Accl site, but a second Accl site is also present 

45 in the Hindlll fragment containing the modified gene. Therefore the Ndel-Hindlll fragment containing the modified gene 
is subcloned into the cloning vector pBR322 (Bolivar, 1977) also cut with Ndel and Hindlll. The position of the Ndel site 
in the 2S albumin gene is indicated in figure 4. The resulting subclone is designated pBRAT2S1 (Figure 21, step 3). 
Sequences encoding the two versions of the growth hormone are inserted into the Accl site of pBRAT2S1 by construct- 
ing a series of complementary synthetic oligonucleotides which when annealed, form the complete sequence of the 

50 GHRF. The codon usage was chosen to approximately match that of AT2S1 , a restriction site (Styl) to be used for diag- 
nostic purposes was included, and at the ends of the GHRF encoding sequences staggered ends complementary to 
BamHI and Pstl sites were included, along with extra bases to ensure that after the steps described below, the reading 
frame of the 2S albumin gene would be maintained. The eight oligonucleotides used in the two constructions are shown 
in figure 22. In figure 22A the limits of the oligonucleotides are indicated by the vertical lines, and the numbers above 

55 and below the sequence indicate their numbers. In oligonucleotides 4 and 8 the bases enclosed in the box are 
excluded, resulting in the GHRFS version of the construction. The bases marked by an * in figure 22A were found to 
have mutated to a T in the clone used for the further construction of GHRFL (pEK7), but as these changes did not effect 
the amino acid sequence the changes were not corrected. The peptide sequence of the GHRF peptide and the 
methionines included to provide CnBr sites are shown above the DNA sequence. The overhanging bases at each end 
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serve to ligate the fragments into BamHI and Pstl sites. These are removed by the S1 digestion. The blunt end fragment 
is then ligated into the Klenow treated Accl site of pBRAT2S1 as shown in Fig. 22B. The reading frame context of the 
Accl site is shown in the upper part of the figure, the cleavage sites being indicated by a '. The results of the manipula- 
tion are below, with the bases resulting from the Accl site and its filling in shown in bold type. 

5 All six oligonucleotides used in each construction were kinased. For the annealing reaction 2 pmole of each oligo- 

nucleotide were combined in a total volume of 12 The mixture was incubated at 90°C for 10 min, moved to at tem- 
perature of approximately 65-70°C for 1 0 min, and then allowed to cool gradually to 30-35°C over a period of 30-45 min. 
At the end of this period ligase buffer (Maniatis et al., 1982) and 1 .5 units of T4-ligase were added, the volume adjusted 
to 15 julI and the mixture incubated overnight at 16°C. The mixture was then incubated at 65°C for 5 min after which 2.5 

10 julI of 100 mM NaCI restriction endonuclease buffer (Maniatis et al., 1982), 5-10 units each of BamHI and Pstl added, 
and the volume adjusted to 25 jutl. This digest is to cleave any concatemers which have formed during the ligation step. 
After digestion for 45 min the reaction was extracted with phenol/chloroform, precipitated, and resuspended in 10 julI , 5 
julI of which were ligated with pUC18 (Yanisch- Perron et al., 1985) which had been digested with BamHI and Pstl and 
treated with bacterial alkaline phosphatase. After transformation of bacterial cells by standard techniques (Maniatis et 

15 al., 1982), recombinant colonies were screened by the method of Grunstein (1975) using oligonucleotide number 1 end 
labeled with 32 P. Clones from each version of the GHRF gene were sequenced, and one clone for each version, desig- 
nated pEK7 (containing GHRFL) and pEK8 (containing GHRFS) were used in further steps (See step 4 in figure 21). 

The BamHI-Pstl fragments of pEK7 and pEK8 were inserted into the Accl site of pBRAT2S1 (Fig. 21 , step 5). The 
details of the treatments done to maintain the open reading frame are shown in Fig. 22. pEK7 and pEK8 were each cut 

20 with both BamHI and Pstl, treated with S1 nuclease, and the fragments containing the GHRF encoding sequences iso- 
lated after gel electrophoresis. These fragments were then separately ligated with pBRAT2S1 which had been cut with 
Accl and treated with the Klenow fragment of DNA polymerase I. The resulting clones were checked for the appropriate 
orientation of the GHRF encoding sequences by digestion with Sty I, a site for which had been included in the synthetic 
sequences for this purpose, and Hindlll. Several clones which proved to contain inserts in the correct orientation were 

25 sequenced. The latter is necessary because S1 nuclease digestion cannot always be strictly controlled. One clone for 
each of two GHRF constructions confirmed to have the correct sequence was used in further steps. These were des- 
ignated pEK100 and pEK200 for GHRFL and GHRFS respectively. 

4. Reconstruction of the complete modified AT2S1 gene with its natural promoter 

30 

The complete chimeric gene is reconstructed as follows (see figure 21): The clone pAT2S1Bg contains a 3.6kb 
Bglll fragment inserted in the cloning vector pJB65 (Botterman et al., 1987) which encompasses not only the 1.0kb Hin- 
dlll fragment containing the coding region of the gene AT2S1 but sufficient sequences upstream and downstream of 
this fragment to contain all necessary regulatory elements for the proper expression of the gene. This plasmid is cut 

35 with Hindlll and the 5.2kb fragment (i.e., that portion of the plasmid not containing the coding region of AT2S1) is iso- 
lated. The clone pAT2S1 is cut with Hindlll and Ndel and the resulting 320 bp Hindlll-Ndel fragment is isolated. This 
fragment represents that removed from the modified 2S albumin in the construction of pBRAT2S1 (step 3 of figure 21) 
in order to allow the insertion of the oligonucleotides in step 5 of figure 21 to proceed without the complications of an 
extra Accl site. These two isolated fragments are then ligated in a three way ligation with the Ndel-Hindlll fragments 

40 from pEK100 and pEK200 respectively (figure 21 , step 6) containing the modified coding sequence. Individual tranform- 
ants can be screened to check for appropriate orientation of the reconstructed Hindlll fragment within the Bglll fragment 
using any of a number of sites. The resulting plasmids pEK502 and pEK601 1 consist of a 2S albumin gene modified 
only in the hypervariable region, surrounded by the same flanking sequences and thus the same promoter as the 
unmodified gene, the entirety contained on a Bglll fragment. 

45 

5. Transformation of plants 

The Bglll fragment containing the chimeric gene is inserted into the Bglll site of the binary vector pGSC1703A (fig. 
16) (see also Fig. 21 step 6), used and described in section 3 of example 2. The resultant plasmid is designated 

so pTAD12. Using standard procedures (Deblaere et al., 1987), pTAD12 is transferred to the Agrobacterium strain 
C58C1 Rif carrying the plasmid pMP90, also used in section 3 of Example II. This Agrobacterium is then used to trans- 
form plants. Tobacco plants of the strain SR1 are transformed using standard procedures (Deblaere et al., 1987). Calli 
are selected on 100 ug/ml kanamycin, and resistant calli used to regenerate plants. 

The techniques for transformation of Arabidopsis thaliana and Brassica napus are such that exactly the same con- 

55 struction, in the same vector, can be used. After mobilization to Agrobacterium tumefaciens as described herebove, the 
procedures of Lloyd et al., (1986) and Klimaszewska et al. (1985) are used for transformation of Arabidopsis and 
Brassica respectively. In each case, as for tobacco, calli can be selected on 100 jig/ml kanamycin, and resistant calli 
used to regenerate plants. 



22 



EP0 723 019 A1 



In the case of all three species at an early stage of regeneration the regenerants are checked for transformation by 
inducing callus from leaf on media supplemented with kanamycin (see also point 6). 

6. Screening and analysis of transformed plants 

5 

In the case of all three species, regenerated plants are grown to seed. Since different transformed plants can be 
expected to have varying levels of expression ("position effects", Jones et al., 1985), more than one tranformant must 
initially be analyzed. This can in principle be done at either the RNA or protein level. In this case seed RNA was pre- 
pared as described in Beachy et al., 1985 and northern blots carried out using standard techniques (Thomas et al., 

10 1 980). Since in the case of both Brassica and Arabidopsis of the entire chimeric gene would result in cross hybridization 
with endogenous genes, oligonucleotide probes complementary to the insertion within the 2S albumin were used; one 
of the oligonucleotides as used to make the construction can be used. For each species, 1 or 2 individual plants were 
chosen for further analysis as disclosed below. 

First the copy number of the chimeric gene is determined by preparing DNA from leaf tissue of the transformed 

is plants (Dellaporta et al., 1983) and probing with the oligonucleotide used above. 

7. Isolation of GHRF analogs 

A) Purification of the chimeric 2S albumins 

20 

The 2S albumins are purified by high salt extraction, gel-filtration and reversed-phase HPLC as described in exam- 
ple II. 

The correct elution times of the chimeric 2S albumins are determined by immunological techniques using commer- 
cially available (UCB-Bioproducts, Drogenbos, Belgium) antibodies directed against the natural GHRF 

25 

B) Cleavage of the chimeric 2S albumin and isolation of the GHRF analogs 

The desalted HPLC- purified GHRF containing 2S albumins are then treated with CNBr (Gross and Witkop, 1961). 
CnBr will liberate the GHRF analogs with an extra homoserine/homoserine-lactone still attached to the COOH-termi- 
30 nus. The GHRF analogs are purified using classical reversed phase HPLC techniques, as described in Example II, and 
their amino acid sequence is determined using the method described in Example II. The isolated GHRFS analog are 
amidated using ammonia, n-butylamine and n-dodecylamine as described by Kempe et al., 1986. This results in the 
described Arg-Hse-NH 2 terminus. 

The second analog, GHRFL, with an extra methionine still present at the carboxyl terminus, is first treated with carbox- 

35 ypeptidase B, removing the carboxyl terminal homoserine residue (Ambler, 1972). This results in a Leu-Gly-COOH ter- 
minus. Treatment with the D-amino acid oxidase in the presence of catalase and ascorbate, as described in Kreil 
(1984), converts the glycine-COOH terminal into the terminal amide-CONH 2 and glyoxylic acid. This set of enzymatic 
steps results in the final amidated GHRFL analog. 

The examples have thus given a complete illustration of how 2S-albumin storage proteins can be modified to incor- 

40 porate therein an insert encoding Leu enkephalin or the Growth Hormone Releasing Factor followed by the transforma- 
tion of tobacco, Arabidopsis and Brassica cells with an appropriate plasmid containing the corresponding modified 
precursor nucleic acid, the regeneration of the transformed plant cells into corresponding plants, the culture thereof up 
to the seed forming stage, the recovery of the seeds, the isolation therefrom of the hybrid 2S albumin and finally recov- 
ery the Leu-enkephalin or the GHRF from said hybrid protein in a purified form. 

45 It will readily be appreciated that the invention thus provides a breakthrough in the art of genetically engineering 
proteins or polypeptides and of producing them in considerable amounts under conditions yielding them in a configura- 
tion that comes close to their natural ones. 

It goes without saying that the invention is not limited to the above examples. The person skilled in the art will in 
each case properly select the storage proteins to be used for the production of any determined polypeptide or peptide 

50 of interest, the nature thereof, e.g. depending the adequate restriction sites which it contains in order to accommodate 
at best the corresponding DNA insert, the choice of the most suitable the seed specific promoter depending on the 
nature of the seed forming plant to be transformed for the sake of producing the corresponding hybrid protein from 
which the peptide of interest can ultimately be cleaved, recovered and purified. 

There follows a list of bibliographic references which have been referred to in the course of the present disclosure 

55 to the extent when reference has been made to known methods for achieving some of the process steps referred to 
herein or to general knowledge which has been established prior to the performance of this invention. 
It is further confirmed 

that plasmid pGV2260 has been deposited with the DSM on 2799 on December, 1983. 
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plasmid pSOYLEA has been deposited with the DSM on 4205 on August 3, 1987; and 

plasmid pBN 2S1 has been deposited with the DSM on 4205 on August 3, 1987. 

plasmids pMa5-8 have been deposited with the DSM on 4567 and pMc on 4566 on May 3, 1988. 

plasmid pAT2S1 has been deposited with the DSM on 4879 on October 7, 1988 

plasmid pAT2S1 Bg has been deposited with the DSM on 4878 on October 7, 1988 

plasmid pGSC1 703A has been deposited with the DSM on 4880 on October 7, 1 988 

plasmid pEK7 has been deposited with the DSM on 4876 on October 7, 1988. 

plasmid pEK8 has been deposited with the DSM on 4877 on October 7, 1988. 

nowithstanding the fact that they all consist of constructs that the person skilled in the art can reproduce them from 
available genetic material without performing any inventive work. 
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2S Albumin As % Of Total Seed Protein 



15 

TABLE 1 



Family, species (common name) 


% 


Compositae 




Helianthus annuus (sunflower) 


62 


Cruciferae 




Brasslca spp. (mustard) 


62 


Linaceae 




Linum usitatissimum (linseed) 


42 


Leguminosae 




Lupinus polyphyllus (lupin) 


38 


Arachis hypogaea (peanut) 


20 


Lecythidaceae 




Bertholletia excelsa (brazil nut) 


30 


Liliaceae 




Yucca spp. (yucca) 


27 


Euphorbiaceae 




Ricinus communis (castor bean) 


44 



40 

From Youle and Huang, 1981 
Claims 

45 

1 . A recombinant DNA comprising a chimeric gene, wherein said chimeric gene comprises in sequence: 

(a) a 5' flanking region comprising a seed-specific promoter of an Arabidopsis gene encoding a 2S albumin 
precursor comprising the amino acid sequence of AT2S1 , AT2S2, AT2S3 or AT2S4 of Fig. 2A; and 
so (b) a DNA coding for a polypeptide of interest. 

2. The recombinant DNA of claim 1, wherein said 5' flanking region comprises the nucleotide sequence of Fig. 13 
between positions -431 and -1. 

55 3. The recombinant DNA of claim 1 or 2, wherein said 5' flanking region is contained in the plasmid pAT2S1 Bg, DSM 
4878 and comprises the nucleotide sequence of Fig. 13 between positions -431 and -1 . 

4. The recombinant DNA of claim 1 or claim 2, wherein said DNA encodes a 2S albumin precursor of: a Brassica spe- 
cies, particularly Brassica napus; an Arabidopsis species, particularly Arabidopsis thaliana; a Ricinus species par- 
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10 



ticularly Ricinus communis; or a Bertholletia species, particularly Bertholletia excelsa and wherein said 2S albumin 
precursor is modified in the region between the sixth and seventh cysteine residues. 

5. The recombinant DNA of claim 4, wherein said DNA encodes a 2S albumin precursor comprising the amino acid 
sequence of AT2S1 , AT2S2, AT2S3 or AT2S4 of Fig. 2A, which is modified in the region between the sixth and sev- 
enth cysteine residues. 

6. A 5' flanking region of an Arabidopsis gene encoding a 2S albumin precursor comprising the amino acid sequence 
of AT2S1 , AT2S2, AT2S3 or AT2S4 of Fig. 2A. 

7. The 5' flanking region of claim 6 comprising the nucleotide sequence of Fig. 13 between positions -431 and -1. 

8. The recombinant DNA of claim 6 or 7, wherein said 5' flanking region is contained in the plasmid pAT2S1 Bg, DSM 
4878 and comprises the nucleotide sequence of Fig. 13 between positions -431 and -1 . 

9. A 2S albumin precursor of a a Brassica species, particularly Brassica napus; an Arabidopsis species, particularly 
Arabidopsis thaliana; a Ricinus species particularly Ricinus communis; or a Bertholletia species, particularly 
Bertholletia excelsa, which is modified in the region between the sixth and seventh cysteine residues. 

20 10. The 2S albumin precursor of claim 9, which is a 2S albumin precursor comprising the amino acid sequence of 
AT2S1 , AT2S2, AT2S3 or AT2S4 of Fig. 2A, which is modified in the region between the sixth and seventh cysteine 
residues. 



15 



11. A DNA encoding the 2S albumin precursor of claim 9 or 1 0. 
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COMPARISON OF 2S ALBUMIN PROTEIN SEQUENCES 



Small Submit 

R. convr\. 

B. txct . 

1. nopus 

A. thali. 

R. comm. 

B. txct. 
B. napas 

A . thali. L 




C RIG Q 1 |Q 

c RJLLL8 



EfQ^QlNlL IJQ 

sHr . .., 

K E lflt H L R A C Q Q 



P R R 



134) 
(21) 
I3S) 
(36) 



Larg* Sub unit 




R. c*mm. — 

B. txe*l. - 

B. Mpus L 

A. thcli. D 



R. comm . 

B. oxctl. 

B. napus. 

A. thali. 

R. comm. 

B. txctl. 
B. nopus 
A. thali. 



cTcl C L R QAJ 



P T L 






Q 





pTE_g 










rv c 


P F|Q 


K T M 




6 - 


P S 


lv_c 


p f in 


"1 


£J 


S F 


PS 



FIG.2 
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COMPARISON OF 2S ALBUMIN PRECURSOR PROTEIN SEQUENCES 

Signal 



Peptide 

Ik : 



at: 
at: 
at: 
at? 



p.a pus 
SI 



S3 
S4 



5 V A A 



N K L 

N K L 

N K L 

N K L 

N K L 



L V 

L V 

L V 

L V 

L V 



ALLJLLVJ 
A J_L A 
A jA L A 
A T0A 
AIL A 
A A L A 



A r. i r. o 



rerrrinal Processed Fragment 



M A 



H A 



FjF L 
C F L 
C F L 
C F L 
- C F[ T jL T N A 



N A t 

T N A 

T N A 

N A 



FIG . 



:: ) 



B. exce. 

B. Napus 

AT2S1 

AT2S2 

AT2S3 

AT2S4 



- F R A | T V jT T T V V - E E E [ n" 
S I YRTVVEFILEDDATN 
S I YRTVVEFE_EDDAT_ N 
S I YRTVVEFD^EDDASN 
S I YRTVVEFE^EDDASN 
SfV "|Y RTVVEFDEDDASN 



Small Subunit 
R. comm. 
B. exce. 
B. napus 
AT2S1 
AT2S2 
AT2S3 
AT2S4 



P~T 

p rR 

p |v 
p i 



C R 
C R[ 
C R 

C R K E 
C 15* K E 
JL Q [H]C Q K E F 
IQKCQKEF 



R. comm. 

B. exce. 

B. napus 

AT2S1 

AT2S2 

AT2S3 

AT2S4 



Y | I -I KQiQ V -IS 
Y MR Q Q M|E EjS 
w [T~h| k Q a[mJ0 S GL 

L M|_lJ Q Q ^A R Q ' ' ~ ~ 



^ M R [MjQ M R Q 
W Mp"]K Q M R Q 
W M R K Q Mf TT]0 



P R R 
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(16) 
(♦♦> 
<♦♦) 
(♦♦) 




G R 

G R 

G R G G 

G R G G 



Hi 



*34) 
(28) 
(35,29) 
(36) 
(♦♦) 
( + + ) 



Internal Processed Fragment 



B. excel. 
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( 5) 


B . napus 
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W 
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(19) 
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1 
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(* + ) 
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s 


L D D 
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E 
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(++) 



Large Subunit 



R. comm. 
B. excel 
B. napus 
AT2S1 
AT2S2 
AT2S3 
AT2S4 



1Q E R-SLRGICCD 
P" l R R G M E~F| H | M S[|]|-_-Jc C [jE 
P Q G P Q Q R (P P [L I Q Q C C N 
P Q G[5lQ Q E|Q[^L[T|Q Q C C N 
P QGP Q 0 GlQ ILQQCCS 
G P Q Q GfT Q L L Q Q C C N 




QRC 




A| 1 1 Q 0 

M M R M Qf^lE 

K A V IkJq' o I Q 'O 
A ~A~ K AVRLQGQ- 
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A A K A V RfT|Q G Q^Q 




R. comm. 

B . excel. 

3. napus 
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3. excel. 
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AT2S4 



Carboxyl Terminal Processed Fragment 



B. excel. 
3. napus 
AT 2 S 1 
AT2S2 
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I A G F 
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( 1) 
( 2) 

< + ♦) 
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FIGURE 3 



SMALL SUBUNIT 



SIGNAL . , 

PEPTIDE A.T.P.P tsJM 2 C, 



HYPERVARI- 
A : BLE 
REGION 



I 



I.P.F. 



NH 



1 \ 



l 



T 



c i $/7 »a cooA" 



C.T.P.F. 



LARGE SUBUNIT 
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llnktr A A A A L L V L MALGHATAF R tf 

G AAT T C CGCGG CAGCAGCCC TCCT TGTCCTCATGGCCCTCGGCCACGCCACCGCCTTCC G CO 
Ece RI Bfl 1 

♦Start mature small subunit 
AT V T T TVVEEEN*ftEECREQM3» 

GGCCACCGTCACCACCACAGTGGTGGAGGAGGAGAACCAGGAGGAGTGTCGCGAGCAGAT 120 



QRQQHI SHCRMYMRQ.QMEESSI 
G CAGAGA CAGCAGAT GCT CAGCCACT GCCGGAT GT ACAT GAGACAGCAGAT GGAGGAGAG ItO 

* Processed * Large lubunH 

ff p Y ft T M<¥- PRRG ME PHMSE C C E ft, 71 

C(XGTA(XAGACCATGCCG^GGCGGGGAATGGAGCCGCACATGAGCGAGTGCTGCGAGCA 210 



L ECMDESCRCEGLRMMMMRMH 

GCT GGAG GGGAT GGACGAGAGC T GCAGAT GCGAAGG CT T AAGGAT GAT GAT GAT GAGGAT 300 

Pit 1 

QQE EMAPRG EQMRRMMRLAE11I 

G CAACAGGAGGAGAT GCAACCCCGAGGGGAGCAGAT GCGAAGGATGAT GAGGCTGGCCGA 3C0 
GCAACAGGAGAAGTACGGTGGATTCTTGAAGCAGATGCG-3' oligonucleotide 
K Y G G F L K 

End mat. Ig . su. ¥■ 
Hi PSRCNLSPMRCPMGGS*IA08 
GAATATCCCTTCCCGCTGCAACCTCAGTCCCATGAGATGCCCCATGGGTGGCTCCATTGC 120 



G F * U0 

CGGG TTC T GAATCTGCCACTAGCCAGTGCTGTAAAT GTTAATAAGGCTCT CACAAACTAG ItO 

EceRl polylinktr ». 

CT CTTT GTT GGCrn TGGCCGGAGACTAGGGTGTGGGGAATTCGAGCTCGGTACCCGGGG 5X0 

Pstl 

ATCCTCTAGAGTCGACCTGCAGGCATGCAAGCTT 574 



FIG. 4. 
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pLt 1 

A N * S A 

GCA AAC TCA GCG 
CGT TTG AGT CGC 
Ddel 

C/TNAG 

p SOY LEA 1 

A N * S D L 
GCA AAC TCA GAT CTG 
CGT TTC AGT CTA GAC 
Bgl II 
A/GATCT 

pBN2S1 

T A ♦ F R A T 

ACC GCC TTC CGG GCC ACC 
TCC CGG AAG GCC CGG TGG 
Bgt 1 

GCCNNNN/NGGC 



FIG. 5 
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HindUI?fl lD 




Ddel 
Kpnl 

460 bp fragment 



FIG.6 



pS0YLEA2 
K P nl Hindlil 



Ap pSCY 




35 



EP 0 723 019 A1 



Smol 
Pstl 



PBN2S1 pJJCtt 
Bgtl 
Klenow 
F*tl 

u purification 205 bp 

PUCBBN1 
EcoRl 
Aval 
Klenow 

PUC18BN2 
EcoRI 

Klenow,dATP 

SI nuclease 
fttl 

pur. fiagm. 



pSPrlEAl 
Bglll 
Klenow 

BcoRI 
pur. fiagm. 



pUCB 



Pstl 
BcoRI 



FIG.7 



PUC18SLBN1 
Bglll 

SI nuclease 
PUC1SSLBN2 pS0Yl£A3 
{Kpnl jKpnl 
PUC18SLBN3 
pBN2S1^ ^ I Pstl 
Pstl \ 

pUC18SLBN4 
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•coRl-Pstl 



CanV 



pMa5-8 



PBN2S1 



t — * T 

PstI EcoRI 



lEcoRI 
IPstI 



pMc 5-8 
Pstl 

EcoRI 



FIG.10 



EcoRI 



Pstl 




single strand 
preparation 



EcoRI I 
Pstl I 



gapped duplex 
annealing 
Klenow, ligation ^ 
selection for Amp* Cam** 
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Ncol 

EcoRI .PstI 




pUC18SLBN4 



IPstI 
Ncol 
pur.fragm 



Bam HI 



P$t! 
Ncol 




Bglll 
Bam HI 



Lieu- enkephalin 



FIG.11 
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FIG.12 
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ATCTTTATCCA -421 

TATAT TGTCTTACCATCAATAGACAATATCCAATGG4CCGOTGACCTGC6TGTATAA8TA -J61 
AT TTTTCAAGATGCTAAAACTTTTATGTATT TCAGAAT TAACCTCCAAAAACATT TATTG - 101 
ACACACTACTACT CT T TCCGT AT T GACT CT CAACTAGT CAT TT CAAAATAAT TGACATG T ~2fl 
CAGAACATGAGTTACACATGGTT GCATATTGCAAGTAGACGCGGAA4CTTGTCACTTCCT -111 
T TA CAT T TGAGTT T CCAACACC T A ATCACGACAACAAT CATATAGCT CTCGCATACAAAC - 1 2 1 
AAACAT AT GCAT GT AT T C T TACACGTGA AC TCCATGCAAGTCTCT TT T CTCACCTATAM -CI 
TAC(>ACCACACCTTCACCACATTCTTCACTCGAACCAAMCATACACA(>TAGCAAAAA -1 

MANKLFLVCAA L A1C FLLTN 20 

A T GGCAAAC AA GT TGT TCCTC GT C TGCGCAGCTCT CGCTCTC TGCTTCCTCCTCACCAAC SO 

♦Start SSU 

ASlYRTVVEfEEDDATN*PlG 40 

GC TT CCATCTA CCGCACC GTCGT T GAGTTCGAAGA4G ATGACGCCACTA A CCCCATAGGC 1 2 0 

PKMRKCRKE FQKEQHLRACQ «0 

COkAAAATGAGGAAATG<XGCMGGAGTTTCAGAAAGAA^ 1M 

«Pr«c«ss«d — ♦ 

G L ML QQARQG RSD *E F D F E D 0 00 

CAAT TGATGCTCCAGCMGCAAGGCAAGGCCGTAGCGATGAGTT TGATT T CGAAGACGAC 240 

w Lotm subunil •> 

HEN*PQGQOlQEQQLFQQCCNE too 

A TG GAGAACCCACAGGGACAACAGCAGGAACAACAGC TAT TCC AGCAG TG CTGCAACGAG 300 

L R Q E EPDCV CPT LKQ. AAKAV 120 
CTTCGCC^GGAAGAGCCAGATTGTGTTTGCCCCACCTTGAA^CAASCTGCCAAGGCCGTT JSO 

oligonucltotidt 5'- CAAOC TGC CAAGTACGGT 

K Y G 

R L G.GG.HQPMOVRK I YQTAKH UO 
AGAC T CCAGGGA CAGCACCAACCAATG CAAGTCAGGAAAATT T ACCAGACAGCCAAGCAC 420 
GGATTCTTGAAGCAGCACCAAC-3 1 eUgo 

• F L K «- 

End mat . Ig. su .w 

LPNVCDlPaVDVCPFNI*PSF 160 

TTGCCCAACGTTTGCGACATCCCGCAAGTTGATGTTTGTCCCTTCAACATCCCTTCATTC 410 

P S F Y * 164 
CC T TCT T T CTACTAAATCTCAAACAAACCCTCAAAGCGTATGAGAGTGTGGTTGTT GATA 540 

TATACATGTTGACACTTGACACATACCACACCTCATCGTGTGTT TTATGATAAATGT 597 



FIG.13 
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PQGQQQEQQlL F ft q c c n e l r q e 
GGl CAICAI CA1GAI CAl CA1 IT I TT I CAICAIT GIT G1AAIGA 

EPOCVCPT IKCtAAKAVRlQGQ 

AAI CA1GCI GCI AAIGC IGT I IG 1 1 T I CAIGG ICA1 

HQ.PMQVRK I YftTAKHLPNVCD 
CAICA CCIAAIGT1TGIGAI 

I PfiLVDVCPFNP 

ATlCClCAlGTtGAIGTITGICCITTIAAICC 



FIG.1t 
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Figure 1 8 
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Figure 20 
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Figure l± Flow chart of constructions showing succesive steps in the deletion of se- 
quences encoding most of the hypervariable region of the Arabidopsis 
2S and their replacement with sequences encoding two GHRF analogs. 



Step 1 



pmac5-8 



Hindlll/Cip 



pat2sl 

Hindll! 



pmacat2sl 



Step 2 



MUTAGENESIS 
(deletion HV + AccI silc) 



pmacat2slc40 

Hindlll/Ndel 



Anneal pUC18 
oligonucleotides 



Step 3 



pbr322 



Hindlll/Ndcl 



B«nHI/PstI/BAP 



pbrat2sl 

AccI/KJcnow 



Step 5 



Step4 



* pEK7 (GHRFL) 
+ pEK8 (GHRFS) 



G«m/Pft 
SI nuclease 



* pEKlOO 
+ pEK200 

Ndel/HindHI 



Step 6 



pat2sl 



Ndel/HindHI 
320 bp fragment 



pat2slbg 

Hindlll 



* pEK502 
+ pEK601 

Bglll 



Step 7 



pgscl703a 

Bgill/Cip 



*pEK551 
+ pEK651 
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Figure 22: 



A 



1 

MYADAIFTNSYRKVLGQL. 



GATCCATGTACGCTGATGCAATATTCACCAACTCTTACAGGAAGGTCCTAGGCCAACTCA 
GTACATGCGACTACGTTATMGTGGTTGAGAATGTCCpTCCAGGATCCGGTTGAGT 



S» A R K 



6 1 2,4 

LLQDjILS RQQGESNQE 
GCGCTAGAAAATTGCTCCAAGACATCCTCTCACGCCAGCAGGGAGAATCCAACCAGGAGA 
CGCGATCTTTTAACGAGGTTCTGTAGGAGAGTGCGGTCGTCCCTCTTAGGTpGGTCCTCT 



R G A R A R 



GAGGCGCCCGCGCTAGG TTGGGA VTGATCTGCA 
CTCCGCGGGCGCGATCqAACCCTlTACTAG 
7,8 



M 



K G I H 
AAA GGT'AT A CAC 
TTT CCA TA'T GTG 



KGIMY GMIIH 

AAA GGT ATC ATG TAC — PEPTIDE GGA ATG ATC ATA CAC 

TTT CCA TAG TAC ATG — SEQUENCE — CCT TAC TAG TAT GTG 
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